

PeopleCaddie
LLM Engineer
β - Featured Role | Apply direct with Data Freelance Hub
This role is for an LLM Engineer (Contract) in San Jose, CA, hybrid (2x/week on-site). Contract length is 1 month with a pay rate of up to $90/hour. Requires 6+ years in machine learning, expertise in LLMs, and proficiency in Python and deep learning frameworks.
π - Country
United States
π± - Currency
$ USD
-
π° - Day rate
720
-
ποΈ - Date
November 14, 2025
π - Duration
1 to 3 months
-
ποΈ - Location
Hybrid
-
π - Contract
W2 Contractor
-
π - Security
Unknown
-
π - Location detailed
San Francisco Bay Area
-
π§ - Skills detailed
#Microservices #"ETL (Extract #Transform #Load)" #Model Evaluation #ML Ops (Machine Learning Operations) #Cloud #Computer Science #AI (Artificial Intelligence) #Hugging Face #Python #Deployment #Data Engineering #Research Skills #PyTorch #ML (Machine Learning) #Deep Learning #Databases #TensorFlow #Scala
Role description
Job Title: LLM Engineer (Contract)
Company: Big Four Client
Location (hybrid): San Jose, CA (Bay Area) - 2x/wk at client site
Work Authorization: U.S. Citizen or Green Card Holder
Pay Rate: Up to $90 per hour (W2), depending on experience
Duration: 1 Month (with possible extension)
Overview:
We are seeking a highly experienced LLM Engineer Contractor to support short-term, high-impact work focused on building, optimizing, and deploying large language models at one of our Big Four clients. The ideal candidate has deep technical expertise across modern LLM architectures, inference systems, and applied machine-learning workflows. This role is fast-paced and hands-on, requiring strong research skills, engineering excellence, and the ability to collaborate across product, data, and ML operations teams. The contractor will contribute directly to model development, performance optimization, and the deployment of LLM-backed features into production environments.
Key Responsibilities:
β’ Model Development & Optimization:
β’ Design, train, fine-tune, and evaluate LLMs for performance, efficiency, safety, and reliability.
β’ Optimize models through techniques such as transfer learning, RLHF, low-rank adaptation (LoRA), quantization-aware training, and distillation.
β’ Conduct rigorous benchmarking and ensure alignment with product or research objectives.
β’ Systems Integration & Deployment:
β’ Build scalable inference pipelines that support high-volume, low-latency LLM serving.
β’ Implement infrastructure optimizations including quantization, caching, sharding, and model distillation.
β’ Integrate models into applications, APIs, or microservices and collaborate with ML Ops to ensure robust deployment.
β’ Research & Cross-Functional Collaboration:
β’ Lead experimentation on new model architectures, prompting strategies, retrieval-augmented generation (RAG), and hybrid search pipelines.
β’ Work closely with product managers, data engineers, ML ops, and research teams to convert experimental insights into production features.
β’ Document findings, communicate results, and contribute to technical roadmaps.
Requirement Skills & Qualifications:
β’ Bachelorβs degree in Computer Science, Machine Learning, Engineering, or related field.
β’ Minimum of 6 years of experience in machine learning engineering, deep learning, or related fields.
β’ Proven hands-on experience training, fine-tuning, and deploying LLMs (e.g., GPT-style, LLaMA-based, or transformer architectures).
β’ Strong proficiency in Python, deep learning frameworks (e.g., PyTorch, TensorFlow), and distributed training systems.
β’ Experience building scalable ML pipelines and working with modern inference stacks (e.g., Triton, Ray Serve, Hugging Face, ONNX Runtime).
β’ Strong understanding of GPU acceleration, optimization, and cloud-native deployment workflows.
β’ Excellent communication skills with the ability to work autonomously in a fast-moving environment.
Preferred Skills & Qualifications:
β’ Experience with RAG systems, vector databases, and search frameworks (e.g., FAISS, Milvus, Pinecone).
β’ Familiarity with model evaluation for alignment, safety, hallucination reduction, and adversarial testing.
β’ Prior experience in a research-oriented or applied AI lab environment.
β’ Masterβs degree or Ph.D. in a relevant technical field.
Job Title: LLM Engineer (Contract)
Company: Big Four Client
Location (hybrid): San Jose, CA (Bay Area) - 2x/wk at client site
Work Authorization: U.S. Citizen or Green Card Holder
Pay Rate: Up to $90 per hour (W2), depending on experience
Duration: 1 Month (with possible extension)
Overview:
We are seeking a highly experienced LLM Engineer Contractor to support short-term, high-impact work focused on building, optimizing, and deploying large language models at one of our Big Four clients. The ideal candidate has deep technical expertise across modern LLM architectures, inference systems, and applied machine-learning workflows. This role is fast-paced and hands-on, requiring strong research skills, engineering excellence, and the ability to collaborate across product, data, and ML operations teams. The contractor will contribute directly to model development, performance optimization, and the deployment of LLM-backed features into production environments.
Key Responsibilities:
β’ Model Development & Optimization:
β’ Design, train, fine-tune, and evaluate LLMs for performance, efficiency, safety, and reliability.
β’ Optimize models through techniques such as transfer learning, RLHF, low-rank adaptation (LoRA), quantization-aware training, and distillation.
β’ Conduct rigorous benchmarking and ensure alignment with product or research objectives.
β’ Systems Integration & Deployment:
β’ Build scalable inference pipelines that support high-volume, low-latency LLM serving.
β’ Implement infrastructure optimizations including quantization, caching, sharding, and model distillation.
β’ Integrate models into applications, APIs, or microservices and collaborate with ML Ops to ensure robust deployment.
β’ Research & Cross-Functional Collaboration:
β’ Lead experimentation on new model architectures, prompting strategies, retrieval-augmented generation (RAG), and hybrid search pipelines.
β’ Work closely with product managers, data engineers, ML ops, and research teams to convert experimental insights into production features.
β’ Document findings, communicate results, and contribute to technical roadmaps.
Requirement Skills & Qualifications:
β’ Bachelorβs degree in Computer Science, Machine Learning, Engineering, or related field.
β’ Minimum of 6 years of experience in machine learning engineering, deep learning, or related fields.
β’ Proven hands-on experience training, fine-tuning, and deploying LLMs (e.g., GPT-style, LLaMA-based, or transformer architectures).
β’ Strong proficiency in Python, deep learning frameworks (e.g., PyTorch, TensorFlow), and distributed training systems.
β’ Experience building scalable ML pipelines and working with modern inference stacks (e.g., Triton, Ray Serve, Hugging Face, ONNX Runtime).
β’ Strong understanding of GPU acceleration, optimization, and cloud-native deployment workflows.
β’ Excellent communication skills with the ability to work autonomously in a fast-moving environment.
Preferred Skills & Qualifications:
β’ Experience with RAG systems, vector databases, and search frameworks (e.g., FAISS, Milvus, Pinecone).
β’ Familiarity with model evaluation for alignment, safety, hallucination reduction, and adversarial testing.
β’ Prior experience in a research-oriented or applied AI lab environment.
β’ Masterβs degree or Ph.D. in a relevant technical field.






