DevOps Engineer

⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a DevOps Engineer with a 12+ month contract in Santa Clara, CA. Key skills include Kubernetes, Ansible, Python, and CI/CD pipeline experience. A minimum of 2 years in DevOps or related fields is required, with GPU-based environment familiarity preferred.
🌎 - Country
United States
πŸ’± - Currency
$ USD
-
πŸ’° - Day rate
-
πŸ—“οΈ - Date discovered
May 30, 2025
πŸ•’ - Project duration
More than 6 months
-
🏝️ - Location type
On-site
-
πŸ“„ - Contract type
Unknown
-
πŸ”’ - Security clearance
Unknown
-
πŸ“ - Location detailed
San Jose, CA
-
🧠 - Skills detailed
#Security #ML (Machine Learning) #Ansible #Cloud #Prometheus #Scala #Infrastructure as Code (IaC) #Version Control #Python #Grafana #Monitoring #Deployment #DevOps #Jenkins #AI (Artificial Intelligence) #Automation #Terraform #Kubernetes #Observability #GitHub #Docker #Bash
Role description
Title: DevOps Engineer Duration: 12+ months Location: Santa Clara, CA (Onsite) Description: β€’ We are seeking a skilled and motivated DevOps Engineer to join our team in building and maintaining high-performance infrastructure for GPU-based workloads. β€’ In this role, you'll be responsible for developing scalable, reliable systems across both on-premises and cloud environments. β€’ You’ll work closely with engineering teams to streamline CI/CD pipelines, automate operations, and support advanced compute environments. Key Responsibilities: β€’ Design and implement scalable infrastructure using Kubernetes across both on-prem and major cloud service providers (CSPs) β€’ Develop and maintain CI/CD pipelines with tools like Buildkite, GitHub Actions, and Jenkins to ensure smooth and reliable software delivery β€’ Automate infrastructure operations using Ansible, Python, and Bash to reduce manual toil and improve system consistency β€’ Manage service deployment within Kubernetes using Helm and GitOps-style workflows β€’ Configure and support GPU servers, including lifecycle management, health monitoring, and test automation β€’ Maintain node health and security, ensuring timely updates and proactive monitoring of GPU server fleets β€’ Provision, scale, and maintain Kubernetes clusters Required Qualifications β€’ 2+ years of experience in DevOps, Site Reliability Engineering (SRE), or Infrastructure Engineering β€’ Proficiency in Ansible, Python, and Bash for automation and tooling β€’ Solid hands-on experience with Kubernetes, Docker, and Helm β€’ Strong knowledge of CI/CD pipeline design, version control best practices, and build systems β€’ Experience with monitoring and observability tools (e.g., Prometheus, Grafana, Nagios) Nice to Have β€’ Familiarity with GPU-based compute environments and automated CI/test workflows β€’ Experience with infrastructure-as-code (IaC) tools such as Terraform β€’ Familiarity with container security practices and CVE scanning β€’ Background in high-performance computing (HPC), Slurm, or ML/AI training pipelines