

Ray Inference Engineer
β - Featured Role | Apply direct with Data Freelance Hub
This role is for a Ray Inference Engineer with a contract length of "unknown," offering a pay rate of "$XX/hour." Key skills include distributed systems, LLM management, and proficiency in Java, Python, or Go. A B.S., M.S., or Ph.D. in a relevant field is required.
π - Country
United States
π± - Currency
$ USD
-
π° - Day rate
-
ποΈ - Date discovered
August 2, 2025
π - Project duration
Unknown
-
ποΈ - Location type
Unknown
-
π - Contract type
Unknown
-
π - Security clearance
Unknown
-
π - Location detailed
United States
-
π§ - Skills detailed
#Kubernetes #PyTorch #TensorFlow #Computer Science #Monitoring #Programming #Python #Automation #Docker #ML (Machine Learning) #Batch #Java #Scala #Deployment
Role description
Heading 1
Heading 2
Heading 3
Heading 4
Heading 5
Heading 6
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
Block quote
Ordered list
- Item 1
- Item 2
- Item 3
Unordered list
- Item A
- Item B
- Item C
Bold text
Emphasis
Superscript
Subscript
Position Summary:
β’ Designing, implementing, and maintaining distributed systems to build world-class ML platforms/products at scale
β’ Experiment with, deploy, and manage LLMs in a production context
β’ Benchmark and optimize inference deployments for different workloads, e.g. online vs. batch vs. streaming workloads
β’ Diagnose, fix, improve, and automate complex issues across the entire stack to ensure maximum uptime and performance
β’ Design and extend services to improve functionality and reliability of the platform
β’ Monitor system performance, optimize for cost and efficiency, and resolve any issues that arise
β’ Build relationships with stakeholders across the organization to better understand internal customer needs and enhance our product better for end users
Minimum Qualifications:
β’ 5+ years of experience in distributed systems with deep knowledge in computer science fundamentals
β’ Experience managing deployments of LLMs at scale
β’ Experience with inference runtimes/engines, e.g. ONNXRT, TensorRT, vLLM, sglang
β’ Experience with ML Training/Inference profiling and optimization for different workloads and tasks, e.g. online inference, batch inference, streaming inference
β’ Experience with profiling ML models for different end use cases, e.g. RAG vs. code completion, etc.
β’ Experience with containerization and orchestration technologies, such as Docker and Kubernetes.
β’ Experience in delivering data and machine learning infrastructure in production environments
β’ Experience configuring, deploying, and troubleshooting large scale production environments
β’ Experience in designing, building, and maintaining scalable, highly available systems that prioritize ease of use
β’ Experience with alerting, monitoring and remediation automation in a large-scale distributed environment
β’ Extensive programming experience in Java, Python or Go
β’ Strong collaboration and communication (verbal and written) skills
β’ B.S., M.S., or Ph.D. in Computer Science, Computer Engineering, or equivalent practical experience
Preferred Qualifications:
β’ Understanding of the ML lifecycle and state of the art ML Infrastructure technologies
β’ Familiarity with CUDA + kernel implementation
β’ Experience with inference optimization and fine-tuning techniques (e.g. pruning, distilling, quantization)
β’ Experience with deploying + optimizing ML models on heterogenous hardware, e.g. GPUs, TPUs, Inferentia, etc.
β’ Experience with GPU and other type of HPC infrastructure
β’ Experience with training framework like PyTorch, Tensorflow, JAX
β’ Deep understanding of Ray and KubeRay