

DevOps Engineer
β - Featured Role | Apply direct with Data Freelance Hub
This role is for a DevOps Engineer with a 12+ month contract in Santa Clara, CA. Key skills include Kubernetes, Ansible, Python, and CI/CD pipeline experience. A minimum of 2 years in DevOps or related fields is required, with GPU-based environment familiarity preferred.
π - Country
United States
π± - Currency
$ USD
-
π° - Day rate
-
ποΈ - Date discovered
May 30, 2025
π - Project duration
More than 6 months
-
ποΈ - Location type
On-site
-
π - Contract type
Unknown
-
π - Security clearance
Unknown
-
π - Location detailed
San Jose, CA
-
π§ - Skills detailed
#Security #ML (Machine Learning) #Ansible #Cloud #Prometheus #Scala #Infrastructure as Code (IaC) #Version Control #Python #Grafana #Monitoring #Deployment #DevOps #Jenkins #AI (Artificial Intelligence) #Automation #Terraform #Kubernetes #Observability #GitHub #Docker #Bash
Role description
Title: DevOps Engineer
Duration: 12+ months
Location: Santa Clara, CA (Onsite)
Description:
β’ We are seeking a skilled and motivated DevOps Engineer to join our team in building and maintaining high-performance infrastructure for GPU-based workloads.
β’ In this role, you'll be responsible for developing scalable, reliable systems across both on-premises and cloud environments.
β’ Youβll work closely with engineering teams to streamline CI/CD pipelines, automate operations, and support advanced compute environments.
Key Responsibilities:
β’ Design and implement scalable infrastructure using Kubernetes across both on-prem and major cloud service providers (CSPs)
β’ Develop and maintain CI/CD pipelines with tools like Buildkite, GitHub Actions, and Jenkins to ensure smooth and reliable software delivery
β’ Automate infrastructure operations using Ansible, Python, and Bash to reduce manual toil and improve system consistency
β’ Manage service deployment within Kubernetes using Helm and GitOps-style workflows
β’ Configure and support GPU servers, including lifecycle management, health monitoring, and test automation
β’ Maintain node health and security, ensuring timely updates and proactive monitoring of GPU server fleets
β’ Provision, scale, and maintain Kubernetes clusters
Required Qualifications
β’ 2+ years of experience in DevOps, Site Reliability Engineering (SRE), or Infrastructure Engineering
β’ Proficiency in Ansible, Python, and Bash for automation and tooling
β’ Solid hands-on experience with Kubernetes, Docker, and Helm
β’ Strong knowledge of CI/CD pipeline design, version control best practices, and build systems
β’ Experience with monitoring and observability tools (e.g., Prometheus, Grafana, Nagios)
Nice to Have
β’ Familiarity with GPU-based compute environments and automated CI/test workflows
β’ Experience with infrastructure-as-code (IaC) tools such as Terraform
β’ Familiarity with container security practices and CVE scanning
β’ Background in high-performance computing (HPC), Slurm, or ML/AI training pipelines
Title: DevOps Engineer
Duration: 12+ months
Location: Santa Clara, CA (Onsite)
Description:
β’ We are seeking a skilled and motivated DevOps Engineer to join our team in building and maintaining high-performance infrastructure for GPU-based workloads.
β’ In this role, you'll be responsible for developing scalable, reliable systems across both on-premises and cloud environments.
β’ Youβll work closely with engineering teams to streamline CI/CD pipelines, automate operations, and support advanced compute environments.
Key Responsibilities:
β’ Design and implement scalable infrastructure using Kubernetes across both on-prem and major cloud service providers (CSPs)
β’ Develop and maintain CI/CD pipelines with tools like Buildkite, GitHub Actions, and Jenkins to ensure smooth and reliable software delivery
β’ Automate infrastructure operations using Ansible, Python, and Bash to reduce manual toil and improve system consistency
β’ Manage service deployment within Kubernetes using Helm and GitOps-style workflows
β’ Configure and support GPU servers, including lifecycle management, health monitoring, and test automation
β’ Maintain node health and security, ensuring timely updates and proactive monitoring of GPU server fleets
β’ Provision, scale, and maintain Kubernetes clusters
Required Qualifications
β’ 2+ years of experience in DevOps, Site Reliability Engineering (SRE), or Infrastructure Engineering
β’ Proficiency in Ansible, Python, and Bash for automation and tooling
β’ Solid hands-on experience with Kubernetes, Docker, and Helm
β’ Strong knowledge of CI/CD pipeline design, version control best practices, and build systems
β’ Experience with monitoring and observability tools (e.g., Prometheus, Grafana, Nagios)
Nice to Have
β’ Familiarity with GPU-based compute environments and automated CI/test workflows
β’ Experience with infrastructure-as-code (IaC) tools such as Terraform
β’ Familiarity with container security practices and CVE scanning
β’ Background in high-performance computing (HPC), Slurm, or ML/AI training pipelines