Ray Inference Engineer

⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Ray Inference Engineer with a contract length of "unknown", offering a pay rate of "unknown". Key skills include distributed systems, LLM management, ML profiling, and experience with Docker and Kubernetes. A B.S., M.S., or Ph.D. in Computer Science is required.
🌎 - Country
United States
πŸ’± - Currency
$ USD
-
πŸ’° - Day rate
-
πŸ—“οΈ - Date discovered
August 29, 2025
πŸ•’ - Project duration
Unknown
-
🏝️ - Location type
Unknown
-
πŸ“„ - Contract type
Unknown
-
πŸ”’ - Security clearance
Unknown
-
πŸ“ - Location detailed
United States
-
🧠 - Skills detailed
#ML (Machine Learning) #Automation #Computer Science #Scala #Docker #Batch #Kubernetes #Java #Monitoring #PyTorch #TensorFlow #Deployment #Python #Programming
Role description
Position Summary: β€’ Designing, implementing, and maintaining distributed systems to build world-class ML platforms/products at scale β€’ Experiment with, deploy, and manage LLMs in a production context β€’ Benchmark and optimize inference deployments for different workloads, e.g. online vs. batch vs. streaming workloads β€’ Diagnose, fix, improve, and automate complex issues across the entire stack to ensure maximum uptime and performance β€’ Design and extend services to improve functionality and reliability of the platform β€’ Monitor system performance, optimize for cost and efficiency, and resolve any issues that arise β€’ Build relationships with stakeholders across the organization to better understand internal customer needs and enhance our product better for end users Minimum Qualifications: β€’ 5+ years of experience in distributed systems with deep knowledge in computer science fundamentals β€’ Experience managing deployments of LLMs at scale β€’ Experience with inference runtimes/engines, e.g. ONNXRT, TensorRT, vLLM, sglang β€’ Experience with ML Training/Inference profiling and optimization for different workloads and tasks, e.g. online inference, batch inference, streaming inference β€’ Experience with profiling ML models for different end use cases, e.g. RAG vs. code completion, etc. β€’ Experience with containerization and orchestration technologies, such as Docker and Kubernetes. β€’ Experience in delivering data and machine learning infrastructure in production environments β€’ Experience configuring, deploying, and troubleshooting large scale production environments β€’ Experience in designing, building, and maintaining scalable, highly available systems that prioritize ease of use β€’ Experience with alerting, monitoring and remediation automation in a large-scale distributed environment β€’ Extensive programming experience in Java, Python or Go β€’ Strong collaboration and communication (verbal and written) skills β€’ B.S., M.S., or Ph.D. in Computer Science, Computer Engineering, or equivalent practical experience Preferred Qualifications: β€’ Understanding of the ML lifecycle and state of the art ML Infrastructure technologies β€’ Familiarity with CUDA + kernel implementation β€’ Experience with inference optimization and fine-tuning techniques (e.g. pruning, distilling, quantization) β€’ Experience with deploying + optimizing ML models on heterogenous hardware, e.g. GPUs, TPUs, Inferentia, etc. β€’ Experience with GPU and other type of HPC infrastructure β€’ Experience with training framework like PyTorch, Tensorflow, JAX β€’ Deep understanding of Ray and KubeRay