Senior Engineer, SRE – GenAI

⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Senior Engineer, SRE – GenAI in Menlo Park, CA, for 11 months at a competitive pay rate. Required skills include expertise in GenAI (GPT, Claude), cloud platforms (AWS, Azure, GCP), and monitoring tools (Splunk, Grafana).
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
600
-
🗓️ - Date discovered
August 15, 2025
🕒 - Project duration
More than 6 months
-
🏝️ - Location type
On-site
-
📄 - Contract type
Unknown
-
🔒 - Security clearance
Unknown
-
📍 - Location detailed
Menlo Park, CA
-
🧠 - Skills detailed
#Deployment #MongoDB #DevOps #Sybase #Databases #GCP (Google Cloud Platform) #Snowflake #ML (Machine Learning) #AI (Artificial Intelligence) #Kubernetes #Prometheus #Documentation #Splunk #Shell Scripting #Monitoring #Jira #Docker #AWS (Amazon Web Services) #Databricks #Python #Cloud #Programming #Grafana #Scripting #Scala #Terraform #Angular #Java #SQL (Structured Query Language) #Azure
Role description
Net2Source Inc. is an award-winning total workforce solutions company recognized by Staffing Industry Analysts for our accelerated growth of 300% in the last 3 years with over 5500+ employees globally, with over 30+ locations in the US and global operations in 32 countries. We believe in providing staffing solutions to address the current talent gap – Right Talent – Right Time – Right Place – Right Price and acting as a Career Coach to our consultants. Job Title: Senior SRE / DevOps Engineer – GenAI (L3) Location: Menlo Park, CA – Onsite Duration: 11 Months Role Overview Seeking an L3 GenAI SRE/DevOps Engineer to provide expert-level technical support, incident resolution, and platform optimization for AI/ML & LLM-powered systems. Key Responsibilities • Technical Support: Troubleshoot, debug, and resolve complex GenAI/LLM issues (GPT, Claude, PaLM2, Llama2, RAG pipelines). • Platform Reliability: Maintain and optimize scalable, fault-tolerant AI/ML platforms; monitor pipelines and inference performance. • Incident Management: Root cause analysis, ITIL-based incident/problem/service management, Jira tracking. • Collaboration: Partner with engineering, DevOps, and product teams for feature improvements and deployment stability. • Monitoring & Optimization: Use Splunk, AppDynamics, Grafana, Loki, Prometheus for system health and performance tuning. • Documentation & Training: Create SOPs, training materials, and technical guides for internal and external teams. • Continuous Improvement: Stay up-to-date on AI/ML/GenAI advancements to improve support processes. Required Skills & Qualifications • GenAI Expertise: GPT, Claude, PaLM2, Llama2, RAG, Agents, CoT. • Cloud & DevOps: AWS, Azure, GCP, Kubernetes (EKS/OpenShift), Docker, Terraform, CI/CD. • Monitoring Tools: Splunk, AppDynamics, Autosys, Grafana, Loki, Prometheus. • Databases: SQL, Sybase, MongoDB, Snowflake, Databricks. • Programming: Python, Shell scripting, Java, Angular, HTTP protocols. • ITIL Processes: Incident/Problem/Service Management. • Strong analytical, troubleshooting, and cross-team collaboration skills. Nehal Ojha Sr. Technical Recruiter +1-551-363 0506 Ojha.nehal@net2source.com www.net2source.com 270 Davidson Ave, Suite 704, Somerset, NJ 08873