EPITEC

Cloud Engineer

⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Cloud Engineer with a contract length of "unknown," offering a pay rate of "unknown." Located in a hybrid setting, it requires expertise in NVIDIA GPU systems, Linux administration, and experience with AI/ML frameworks and orchestration tools.
🌎 - Country
United States
πŸ’± - Currency
$ USD
-
πŸ’° - Day rate
Unknown
-
πŸ—“οΈ - Date
November 20, 2025
πŸ•’ - Duration
Unknown
-
🏝️ - Location
Hybrid
-
πŸ“„ - Contract
Unknown
-
πŸ”’ - Security
Unknown
-
πŸ“ - Location detailed
Chicago, IL
-
🧠 - Skills detailed
#AI (Artificial Intelligence) #Docker #Grafana #GCP (Google Cloud Platform) #Documentation #Bash #Scala #Containers #Python #Kubernetes #Azure #Terraform #Ansible #Data Science #Compliance #Monitoring #Security #Automation #Linux #DevOps #PyTorch #TensorFlow #Scripting #ML (Machine Learning) #Cloud #AWS (Amazon Web Services) #Prometheus #Server Administration
Role description
Senior Server Administrator – AI Engineering Team Onsite Requirement: 1 day/week initially, transitioning to 5 days/week About the Role We are seeking a highly skilled Senior Server Administrator to join our AI Engineering team. This role is pivotal in deploying, maintaining, and optimizing high-performance computing infrastructure leveraging NVIDIA GPU technologies. You will collaborate with AI researchers, data scientists, and software engineers to ensure our systems are robust, scalable, and tuned for cutting-edge machine learning workloads. What You’ll Do β€’ Administer and maintain GPU-accelerated servers and clusters (NVIDIA A100, H100, etc.). β€’ Manage and optimize NVIDIA software stack (CUDA, cuDNN, TensorRT, NCCL, NGC containers). β€’ Monitor system performance, troubleshoot issues, and ensure high availability. β€’ Collaborate with DevOps and AI teams to support containerized workflows (Docker, Kubernetes). β€’ Implement security best practices and compliance standards. β€’ Lead upgrades, patching, and lifecycle management of GPU servers. β€’ Provide documentation, automation scripts, and training for internal teams. Why This Opportunity Stands Out β€’ Work on cutting-edge AI infrastructure projects. β€’ Collaborate with top-tier engineers and researchers. β€’ Gain hands-on experience with advanced GPU technologies and scalable systems. What We’re Looking For β€’ Education: Bachelor’s Degree + 8 years of experience (5+ years in server administration, 3+ years with NVIDIA GPU systems). β€’ Technical Skills:Linux system administration in HPC/AI environments. β€’ NVIDIA GPU drivers, CUDA toolkit, performance tuning. β€’ Slurm, Kubernetes, orchestration tools. β€’ Monitoring tools (Prometheus, Grafana), automation (Ansible, Terraform). β€’ Strong scripting (Bash, Python). Preferred:NVIDIA Certified Professional. β€’ Multi-GPU/multi-node training experience. β€’ Familiarity with AI/ML frameworks (PyTorch, TensorFlow). β€’ Exposure to cloud GPU infrastructure (AWS, Azure, GCP).