DSM-H Consulting

Cloud Engineer (NVIDIA GPU-based Systems)

⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Cloud Engineer (NVIDIA GPU-based Systems) with a contract length of "X months" and a pay rate of "$X/hour". Located in "Dallas, TX, Peoria, IL, Phoenix, AZ, or Cary, NC", it requires 8+ years of experience, including 3+ years with NVIDIA GPU systems.
🌎 - Country
United States
πŸ’± - Currency
$ USD
-
πŸ’° - Day rate
Unknown
-
πŸ—“οΈ - Date
November 11, 2025
πŸ•’ - Duration
Unknown
-
🏝️ - Location
Hybrid
-
πŸ“„ - Contract
Unknown
-
πŸ”’ - Security
Unknown
-
πŸ“ - Location detailed
Chicago, IL
-
🧠 - Skills detailed
#Security #Linux #ML (Machine Learning) #Kubernetes #Prometheus #Grafana #Bash #Monitoring #Automation #Containers #AWS (Amazon Web Services) #Cloud #Ansible #Python #PyTorch #GCP (Google Cloud Platform) #Terraform #Scripting #Documentation #Azure #Server Administration #DevOps #AI (Artificial Intelligence) #Compliance #Docker #TensorFlow
Role description
Locations: Dallas, TX, Peoria, IL, Phoenix, AZ, Cary, NC as well Typical task breakdown: - Administer and maintain GPU-accelerated servers and clusters, including NVIDIA A100, H100, and other high-end GPU sets. - Manage and optimize NVIDIA software stack components such as CUDA, cuDNN, TensorRT, NCCL, and NGC containers. - Monitor system performance, troubleshoot hardware/software issues, and ensure high availability of AI infrastructure. - Collaborate with DevOps and AI teams to support containerized workflows (Docker, Kubernetes) and distributed training environments. - Implement security best practices and ensure compliance with internal and external standards. - Lead upgrades, patching, and lifecycle management of GPU servers and related infrastructure. - Provide documentation, automation scripts, and training for internal teams. Work environment: Candidates must be able to go into office 1 day a week and eventually go into office 5 days a week when notified. Education & Experience Required: - Bachelor’s Degree with a minimum of 8 years work experience, 5+ years of experience in server administration, with at least 3 years focused on NVIDIA GPU-based systems Technical Skills: - 5+ years of experience in server administration, with at least 3 years focused on NVIDIA GPU-based systems. - Deep understanding of Linux system administration, especially in HPC or AI environments. - Hands-on experience with NVIDIA GPU drivers, CUDA toolkit, and performance tuning. - Familiarity with Slurm, Kubernetes, or other job scheduling and orchestration tools. - Experience with monitoring tools (e.g., Prometheus, Grafana) and infrastructure automation (e.g., Ansible, Terraform). - Strong scripting skills (Bash, Python, etc.). - Excellent problem-solving and communication skills. (Desired) - NVIDIA Certified Professional or similar credentials. - Experience with multi-GPU and multi-node training setups. - Familiarity with AI/ML frameworks (e.g., PyTorch, TensorFlow) and their GPU dependencies. - Exposure to cloud-based GPU infrastructure (AWS, Azure, GCP).