Rivago Infotech Inc

AI Platform with LLM Infrastructure

⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Senior AI Platform / LLM Infrastructure Engineer in Charlotte, NC (Hybrid) for a long-term project. Key skills include LLM inference frameworks, model optimization, Kubernetes, GPU orchestration, and Python programming.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
Unknown
-
🗓️ - Date
June 9, 2026
🕒 - Duration
Unknown
-
🏝️ - Location
Hybrid
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
Charlotte, NC
-
🧠 - Skills detailed
#"ETL (Extract #Transform #Load)" #Python #Kubernetes #Observability #Batch #Deployment #Scala #AI (Artificial Intelligence) #Programming #Grafana #Prometheus #ML (Machine Learning) #Model Optimization #Monitoring
Role description
Role: Senior AI Platform / LLM Infrastructure Engineer Location: Charlotte, NC (Hybrid) Duration: Long Term Project We are hiring a Senior AI Platform Engineer to build and optimize on-prem LLM inference platforms. The role focuses on high-performance model serving, GPU workloads, and scalable ML infrastructure using modern inference frameworks and Kubernetes. Must-Have Skills • LLM Inference Frameworks: vLLM, TensorRT-LLM, Triton Inference Server, SGLang • Model Optimization: Continuous Batching, Speculative Decoding, KV Cache / Prefix Caching, FP8 / AWQ / GPTQ • Distributed/Parallel Systems: Tensor Parallelism • Platform & Orchestration: Kubernetes, KServe, OpenShift AI, Helm / Operators • GPU & Performance: CUDA, NCCL, MIG, GPU Orchestration (Run:AI) • Monitoring: Prometheus, Grafana, ML Observability • Programming: Python • GenAI Tools: Arize AI, Claude (CoWork) • Load / performance testing: GuideLLM, Locust Key Responsibilities • Build and manage LLM inference platforms on on-prem GPU infrastructure • Optimize model performance using advanced inference techniques (batching, caching, quantization) • Deploy and operate ML workloads on Kubernetes (KServe/OpenShift AI) • Enable GPU scheduling and orchestration for large-scale workloads • Implement monitoring and performance benchmarking frameworks • Drive SRE practices for platform reliability and scalability (observability, incident handling) • Collaborate with AI/ML teams to enable production-grade GenAI deployments