

Rivago Infotech Inc
AI Platform with LLM Infrastructure
⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Senior AI Platform / LLM Infrastructure Engineer in Charlotte, NC (Hybrid) for a long-term project. Key skills include LLM inference frameworks, model optimization, Kubernetes, GPU orchestration, and Python programming.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
Unknown
-
🗓️ - Date
June 9, 2026
🕒 - Duration
Unknown
-
🏝️ - Location
Hybrid
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
Charlotte, NC
-
🧠 - Skills detailed
#"ETL (Extract #Transform #Load)" #Python #Kubernetes #Observability #Batch #Deployment #Scala #AI (Artificial Intelligence) #Programming #Grafana #Prometheus #ML (Machine Learning) #Model Optimization #Monitoring
Role description
Role: Senior AI Platform / LLM Infrastructure Engineer
Location: Charlotte, NC (Hybrid)
Duration: Long Term Project
We are hiring a Senior AI Platform Engineer to build and optimize on-prem LLM inference platforms. The role focuses on high-performance model serving, GPU workloads, and scalable ML infrastructure using modern inference frameworks and Kubernetes.
Must-Have Skills
• LLM Inference Frameworks: vLLM, TensorRT-LLM, Triton Inference Server, SGLang
• Model Optimization: Continuous Batching, Speculative Decoding, KV Cache / Prefix Caching, FP8 / AWQ / GPTQ
• Distributed/Parallel Systems: Tensor Parallelism
• Platform & Orchestration: Kubernetes, KServe, OpenShift AI, Helm / Operators
• GPU & Performance: CUDA, NCCL, MIG, GPU Orchestration (Run:AI)
• Monitoring: Prometheus, Grafana, ML Observability
• Programming: Python
• GenAI Tools: Arize AI, Claude (CoWork)
• Load / performance testing: GuideLLM, Locust
Key Responsibilities
• Build and manage LLM inference platforms on on-prem GPU infrastructure
• Optimize model performance using advanced inference techniques (batching, caching, quantization)
• Deploy and operate ML workloads on Kubernetes (KServe/OpenShift AI)
• Enable GPU scheduling and orchestration for large-scale workloads
• Implement monitoring and performance benchmarking frameworks
• Drive SRE practices for platform reliability and scalability (observability, incident handling)
• Collaborate with AI/ML teams to enable production-grade GenAI deployments
Role: Senior AI Platform / LLM Infrastructure Engineer
Location: Charlotte, NC (Hybrid)
Duration: Long Term Project
We are hiring a Senior AI Platform Engineer to build and optimize on-prem LLM inference platforms. The role focuses on high-performance model serving, GPU workloads, and scalable ML infrastructure using modern inference frameworks and Kubernetes.
Must-Have Skills
• LLM Inference Frameworks: vLLM, TensorRT-LLM, Triton Inference Server, SGLang
• Model Optimization: Continuous Batching, Speculative Decoding, KV Cache / Prefix Caching, FP8 / AWQ / GPTQ
• Distributed/Parallel Systems: Tensor Parallelism
• Platform & Orchestration: Kubernetes, KServe, OpenShift AI, Helm / Operators
• GPU & Performance: CUDA, NCCL, MIG, GPU Orchestration (Run:AI)
• Monitoring: Prometheus, Grafana, ML Observability
• Programming: Python
• GenAI Tools: Arize AI, Claude (CoWork)
• Load / performance testing: GuideLLM, Locust
Key Responsibilities
• Build and manage LLM inference platforms on on-prem GPU infrastructure
• Optimize model performance using advanced inference techniques (batching, caching, quantization)
• Deploy and operate ML workloads on Kubernetes (KServe/OpenShift AI)
• Enable GPU scheduling and orchestration for large-scale workloads
• Implement monitoring and performance benchmarking frameworks
• Drive SRE practices for platform reliability and scalability (observability, incident handling)
• Collaborate with AI/ML teams to enable production-grade GenAI deployments






