

DSM-H Consulting
Cloud Engineer (NVIDIA GPU-based Systems)
β - Featured Role | Apply direct with Data Freelance Hub
This role is for a Cloud Engineer (NVIDIA GPU-based Systems) with a contract length of "X months" and a pay rate of "$X/hour". Located in "Dallas, TX, Peoria, IL, Phoenix, AZ, or Cary, NC", it requires 8+ years of experience, including 3+ years with NVIDIA GPU systems.
π - Country
United States
π± - Currency
$ USD
-
π° - Day rate
Unknown
-
ποΈ - Date
November 11, 2025
π - Duration
Unknown
-
ποΈ - Location
Hybrid
-
π - Contract
Unknown
-
π - Security
Unknown
-
π - Location detailed
Chicago, IL
-
π§ - Skills detailed
#Security #Linux #ML (Machine Learning) #Kubernetes #Prometheus #Grafana #Bash #Monitoring #Automation #Containers #AWS (Amazon Web Services) #Cloud #Ansible #Python #PyTorch #GCP (Google Cloud Platform) #Terraform #Scripting #Documentation #Azure #Server Administration #DevOps #AI (Artificial Intelligence) #Compliance #Docker #TensorFlow
Role description
Locations: Dallas, TX, Peoria, IL, Phoenix, AZ, Cary, NC as well
Typical task breakdown:
- Administer and maintain GPU-accelerated servers and clusters, including NVIDIA A100, H100, and other high-end GPU sets.
- Manage and optimize NVIDIA software stack components such as CUDA, cuDNN, TensorRT, NCCL, and NGC containers.
- Monitor system performance, troubleshoot hardware/software issues, and ensure high availability of AI infrastructure.
- Collaborate with DevOps and AI teams to support containerized workflows (Docker, Kubernetes) and distributed training environments.
- Implement security best practices and ensure compliance with internal and external standards.
- Lead upgrades, patching, and lifecycle management of GPU servers and related infrastructure.
- Provide documentation, automation scripts, and training for internal teams.
Work environment:
Candidates must be able to go into office 1 day a week and eventually go into office 5 days a week when notified.
Education & Experience Required:
- Bachelorβs Degree with a minimum of 8 years work experience, 5+ years of experience in server administration, with at least 3 years focused on NVIDIA GPU-based systems
Technical Skills:
- 5+ years of experience in server administration, with at least 3 years focused on NVIDIA GPU-based systems.
- Deep understanding of Linux system administration, especially in HPC or AI environments.
- Hands-on experience with NVIDIA GPU drivers, CUDA toolkit, and performance tuning.
- Familiarity with Slurm, Kubernetes, or other job scheduling and orchestration tools.
- Experience with monitoring tools (e.g., Prometheus, Grafana) and infrastructure automation (e.g., Ansible, Terraform).
- Strong scripting skills (Bash, Python, etc.).
- Excellent problem-solving and communication skills.
(Desired)
- NVIDIA Certified Professional or similar credentials.
- Experience with multi-GPU and multi-node training setups.
- Familiarity with AI/ML frameworks (e.g., PyTorch, TensorFlow) and their GPU dependencies.
- Exposure to cloud-based GPU infrastructure (AWS, Azure, GCP).
Locations: Dallas, TX, Peoria, IL, Phoenix, AZ, Cary, NC as well
Typical task breakdown:
- Administer and maintain GPU-accelerated servers and clusters, including NVIDIA A100, H100, and other high-end GPU sets.
- Manage and optimize NVIDIA software stack components such as CUDA, cuDNN, TensorRT, NCCL, and NGC containers.
- Monitor system performance, troubleshoot hardware/software issues, and ensure high availability of AI infrastructure.
- Collaborate with DevOps and AI teams to support containerized workflows (Docker, Kubernetes) and distributed training environments.
- Implement security best practices and ensure compliance with internal and external standards.
- Lead upgrades, patching, and lifecycle management of GPU servers and related infrastructure.
- Provide documentation, automation scripts, and training for internal teams.
Work environment:
Candidates must be able to go into office 1 day a week and eventually go into office 5 days a week when notified.
Education & Experience Required:
- Bachelorβs Degree with a minimum of 8 years work experience, 5+ years of experience in server administration, with at least 3 years focused on NVIDIA GPU-based systems
Technical Skills:
- 5+ years of experience in server administration, with at least 3 years focused on NVIDIA GPU-based systems.
- Deep understanding of Linux system administration, especially in HPC or AI environments.
- Hands-on experience with NVIDIA GPU drivers, CUDA toolkit, and performance tuning.
- Familiarity with Slurm, Kubernetes, or other job scheduling and orchestration tools.
- Experience with monitoring tools (e.g., Prometheus, Grafana) and infrastructure automation (e.g., Ansible, Terraform).
- Strong scripting skills (Bash, Python, etc.).
- Excellent problem-solving and communication skills.
(Desired)
- NVIDIA Certified Professional or similar credentials.
- Experience with multi-GPU and multi-node training setups.
- Familiarity with AI/ML frameworks (e.g., PyTorch, TensorFlow) and their GPU dependencies.
- Exposure to cloud-based GPU infrastructure (AWS, Azure, GCP).






