Ray Inference Engineer

⭐ - Featured Role | Apply direct with Data Freelance Hub

This role is for a Ray Inference Engineer with a contract length of "unknown", offering a pay rate of "unknown". Key skills include distributed systems, LLM management, ML profiling, and experience with Docker and Kubernetes. A B.S., M.S., or Ph.D. in Computer Science is required.

🌎 - Country

United States

💱 - Currency

$ USD

💰 - Day rate

🗓️ - Date discovered

August 29, 2025

🕒 - Project duration

Unknown

🏝️ - Location type

Unknown

📄 - Contract type

Unknown

🔒 - Security clearance

Unknown

📍 - Location detailed

United States

🧠 - Skills detailed

#ML (Machine Learning) #Automation #Computer Science #Scala #Docker #Batch #Kubernetes #Java #Monitoring #PyTorch #TensorFlow #Deployment #Python #Programming

Role description

Position Summary: • Designing, implementing, and maintaining distributed systems to build world-class ML platforms/products at scale • Experiment with, deploy, and manage LLMs in a production context • Benchmark and optimize inference deployments for different workloads, e.g. online vs. batch vs. streaming workloads • Diagnose, fix, improve, and automate complex issues across the entire stack to ensure maximum uptime and performance • Design and extend services to improve functionality and reliability of the platform • Monitor system performance, optimize for cost and efficiency, and resolve any issues that arise • Build relationships with stakeholders across the organization to better understand internal customer needs and enhance our product better for end users Minimum Qualifications: • 5+ years of experience in distributed systems with deep knowledge in computer science fundamentals • Experience managing deployments of LLMs at scale • Experience with inference runtimes/engines, e.g. ONNXRT, TensorRT, vLLM, sglang • Experience with ML Training/Inference profiling and optimization for different workloads and tasks, e.g. online inference, batch inference, streaming inference • Experience with profiling ML models for different end use cases, e.g. RAG vs. code completion, etc. • Experience with containerization and orchestration technologies, such as Docker and Kubernetes. • Experience in delivering data and machine learning infrastructure in production environments • Experience configuring, deploying, and troubleshooting large scale production environments • Experience in designing, building, and maintaining scalable, highly available systems that prioritize ease of use • Experience with alerting, monitoring and remediation automation in a large-scale distributed environment • Extensive programming experience in Java, Python or Go • Strong collaboration and communication (verbal and written) skills • B.S., M.S., or Ph.D. in Computer Science, Computer Engineering, or equivalent practical experience Preferred Qualifications: • Understanding of the ML lifecycle and state of the art ML Infrastructure technologies • Familiarity with CUDA + kernel implementation • Experience with inference optimization and fine-tuning techniques (e.g. pruning, distilling, quantization) • Experience with deploying + optimizing ML models on heterogenous hardware, e.g. GPUs, TPUs, Inferentia, etc. • Experience with GPU and other type of HPC infrastructure • Experience with training framework like PyTorch, Tensorflow, JAX • Deep understanding of Ray and KubeRay

Apply now Apply with DFH Sign up

← See all roles