New York Global Consultants Inc. (NYGCI)

Senior LLMOps Engineer

⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Senior LLMOps Engineer with a contract length of "unknown" and a pay rate of "unknown," located in Charlotte, NC or Jersey City, NJ. Key skills include Kubernetes, LLM deployment, and MLOps pipeline management.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
Unknown
-
🗓️ - Date
February 10, 2026
🕒 - Duration
Unknown
-
🏝️ - Location
Hybrid
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
Jersey City, NJ
-
🧠 - Skills detailed
#Deployment #AI (Artificial Intelligence) #Monitoring #"ETL (Extract #Transform #Load)" #Kubernetes #Load Balancing #Model Optimization #API (Application Programming Interface) #ML (Machine Learning) #Scala #Microservices
Role description
Position: Senior Consultant (AI/ML Platform) Location: Charlotte, NC or Jersey City, NJ (Hybrid 3 days a week onsite) Project Tasks: AI Operations Platform Consultant • Experience deploying, managing, operating, and troubleshooting containerized services at scale on Kubernetes for mission-critical applications (OpenShift) • Experience with deploying, configuring, and tuning LLMs using TensorRT-LLM and Triton Inference server. • Managing MLOps/LLMOps pipelines, using TensorRT-LLM and Triton Inference server to deploy inference services in production • Setup and operation of AI inference service monitoring for performance and availability. • Experience deploying and troubleshooting LLM models on a containerized platform, monitoring, load balancing, etc. • Operation and support of MLOps/LLMOps pipelines, using TensorRT-LLM and Triton Inference server to deploy inference services in production • Experience deploying and troubleshooting LLM models on a containerized platform, monitoring, load balancing, etc. • Experience with standard processes for operation of a mission critical system – incident management, change management, event management, etc. • Managing scalable infrastructure for deploying and managing LLMs • Deploying models in production environments, including containerization, microservices, and API design • Triton Inference Server, including its architecture, configuration, and deployment. • Model Optimization techniques using Triton with TRTLLM • Model optimization techniques, including pruning, quantization, and knowledge distillation