

Raas Infotek
GPU Inference Engineer
⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a GPU Inference Engineer on a contract basis, US remote. Requires deep cloud services experience, LLM hosting, strong communication skills, and familiarity with benchmarking tools. Preferred skills include knowledge of NVIDIA/AMD GPUs and distributed inference optimization techniques.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
Unknown
-
🗓️ - Date
October 28, 2025
🕒 - Duration
Unknown
-
🏝️ - Location
Remote
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
United States
-
🧠 - Skills detailed
#Deployment #Docker #Cloud #Kubernetes #Observability #AI (Artificial Intelligence) #Storage #Logging #Documentation #Infrastructure as Code (IaC) #Monitoring #Data Storage
Role description
Hi,
Hope you are doing well.
I have an immediate requirement with one of my clients. If you are interested please provide me with the updated resume and best time to call you.
Role - GPU Inference Engineer
Location – US Remote
Mode: Contract
Dedicated Inference Service
We are now looking for devs with general cloud services / distributed services experience, with LLM experience as a secondary skill.
Required Skills
• Deep experience building services in modern cloud environments on distributed systems (i.e., containerization (Kubernetes, Docker), infrastructure as code, CI/CD pipelines, APIs, authentication and authorization, data storage, deployment, logging, monitoring, alerting, etc.)
• Experience working with Large Language Models (LLMs), particularly hosting them to run inference
• Strong verbal and written communication skills. Your job will involve communicating with local and remote colleagues about technical subjects and writing detailed documentation.
Preferred Skills
• Experience with building or using benchmarking tools for evaluating LLM inference for various models, engine, and GPU combinations.
• Familiarity with various LLM performance metrics such as prefill throughput, decode throughput, TPOT, and TTFT
• Experience with one or more inference engines: e.g., vLLM, SGLang, and Modular Max
• Familiarity with one or more distributed inference serving frameworks: e.g., llm-d, NVIDIA Dynamo, and Ray Serve etc.
• Experience with AMD and NVIDIA GPUs, using software like CUDA, ROCm, AITER, NCCL, RCCL, etc.
• Knowledge of distributed inference optimization techniques - tensor/data parallelism, KV cache optimizations, smart routing etc.
What You'll Be Working On:
• Develop and maintain an inference platform for serving large language models optimized for the various GPU platforms they will be run on.
• Work on complex AI and cloud engineering projects through the entire product development lifecycle (PDLC) - ideation, product definition, experimentation, prototyping, development, testing, release, and operations.
• Build tooling and observability to monitor system health, and build auto tuning capabilities.
• Build benchmarking frameworks to test model serving performance to guide system and infrastructure tuning efforts.
• Build native cross platform inference support across NVIDIA and AMD GPUs for a variety of model architectures.
Thanks,
Ravi Kumar
Raas Infotek
Newark, DE 19702
Direct No: 302-286-9894
Email: Ravi.kumar@raasinfotek.com
Hi,
Hope you are doing well.
I have an immediate requirement with one of my clients. If you are interested please provide me with the updated resume and best time to call you.
Role - GPU Inference Engineer
Location – US Remote
Mode: Contract
Dedicated Inference Service
We are now looking for devs with general cloud services / distributed services experience, with LLM experience as a secondary skill.
Required Skills
• Deep experience building services in modern cloud environments on distributed systems (i.e., containerization (Kubernetes, Docker), infrastructure as code, CI/CD pipelines, APIs, authentication and authorization, data storage, deployment, logging, monitoring, alerting, etc.)
• Experience working with Large Language Models (LLMs), particularly hosting them to run inference
• Strong verbal and written communication skills. Your job will involve communicating with local and remote colleagues about technical subjects and writing detailed documentation.
Preferred Skills
• Experience with building or using benchmarking tools for evaluating LLM inference for various models, engine, and GPU combinations.
• Familiarity with various LLM performance metrics such as prefill throughput, decode throughput, TPOT, and TTFT
• Experience with one or more inference engines: e.g., vLLM, SGLang, and Modular Max
• Familiarity with one or more distributed inference serving frameworks: e.g., llm-d, NVIDIA Dynamo, and Ray Serve etc.
• Experience with AMD and NVIDIA GPUs, using software like CUDA, ROCm, AITER, NCCL, RCCL, etc.
• Knowledge of distributed inference optimization techniques - tensor/data parallelism, KV cache optimizations, smart routing etc.
What You'll Be Working On:
• Develop and maintain an inference platform for serving large language models optimized for the various GPU platforms they will be run on.
• Work on complex AI and cloud engineering projects through the entire product development lifecycle (PDLC) - ideation, product definition, experimentation, prototyping, development, testing, release, and operations.
• Build tooling and observability to monitor system health, and build auto tuning capabilities.
• Build benchmarking frameworks to test model serving performance to guide system and infrastructure tuning efforts.
• Build native cross platform inference support across NVIDIA and AMD GPUs for a variety of model architectures.
Thanks,
Ravi Kumar
Raas Infotek
Newark, DE 19702
Direct No: 302-286-9894
Email: Ravi.kumar@raasinfotek.com






