

Senior Engineer, SRE – GenAI
⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Senior Engineer, SRE – GenAI in Menlo Park, CA, for 11 months at a competitive pay rate. Required skills include expertise in GenAI (GPT, Claude), cloud platforms (AWS, Azure, GCP), and monitoring tools (Splunk, Grafana).
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
600
-
🗓️ - Date discovered
August 15, 2025
🕒 - Project duration
More than 6 months
-
🏝️ - Location type
On-site
-
📄 - Contract type
Unknown
-
🔒 - Security clearance
Unknown
-
📍 - Location detailed
Menlo Park, CA
-
🧠 - Skills detailed
#Deployment #MongoDB #DevOps #Sybase #Databases #GCP (Google Cloud Platform) #Snowflake #ML (Machine Learning) #AI (Artificial Intelligence) #Kubernetes #Prometheus #Documentation #Splunk #Shell Scripting #Monitoring #Jira #Docker #AWS (Amazon Web Services) #Databricks #Python #Cloud #Programming #Grafana #Scripting #Scala #Terraform #Angular #Java #SQL (Structured Query Language) #Azure
Role description
Net2Source Inc. is an award-winning total workforce solutions company recognized by Staffing Industry Analysts for our accelerated growth of 300% in the last 3 years with over 5500+ employees globally, with over 30+ locations in the US and global operations in 32 countries. We believe in providing staffing solutions to address the current talent gap – Right Talent – Right Time – Right Place – Right Price and acting as a Career Coach to our consultants.
Job Title: Senior SRE / DevOps Engineer – GenAI (L3)
Location: Menlo Park, CA – Onsite
Duration: 11 Months
Role Overview
Seeking an L3 GenAI SRE/DevOps Engineer to provide expert-level technical support, incident resolution, and platform optimization for AI/ML & LLM-powered systems.
Key Responsibilities
• Technical Support: Troubleshoot, debug, and resolve complex GenAI/LLM issues (GPT, Claude, PaLM2, Llama2, RAG pipelines).
• Platform Reliability: Maintain and optimize scalable, fault-tolerant AI/ML platforms; monitor pipelines and inference performance.
• Incident Management: Root cause analysis, ITIL-based incident/problem/service management, Jira tracking.
• Collaboration: Partner with engineering, DevOps, and product teams for feature improvements and deployment stability.
• Monitoring & Optimization: Use Splunk, AppDynamics, Grafana, Loki, Prometheus for system health and performance tuning.
• Documentation & Training: Create SOPs, training materials, and technical guides for internal and external teams.
• Continuous Improvement: Stay up-to-date on AI/ML/GenAI advancements to improve support processes.
Required Skills & Qualifications
• GenAI Expertise: GPT, Claude, PaLM2, Llama2, RAG, Agents, CoT.
• Cloud & DevOps: AWS, Azure, GCP, Kubernetes (EKS/OpenShift), Docker, Terraform, CI/CD.
• Monitoring Tools: Splunk, AppDynamics, Autosys, Grafana, Loki, Prometheus.
• Databases: SQL, Sybase, MongoDB, Snowflake, Databricks.
• Programming: Python, Shell scripting, Java, Angular, HTTP protocols.
• ITIL Processes: Incident/Problem/Service Management.
• Strong analytical, troubleshooting, and cross-team collaboration skills.
Nehal Ojha
Sr. Technical Recruiter
+1-551-363 0506
Ojha.nehal@net2source.com
www.net2source.com
270 Davidson Ave, Suite 704, Somerset, NJ 08873
Net2Source Inc. is an award-winning total workforce solutions company recognized by Staffing Industry Analysts for our accelerated growth of 300% in the last 3 years with over 5500+ employees globally, with over 30+ locations in the US and global operations in 32 countries. We believe in providing staffing solutions to address the current talent gap – Right Talent – Right Time – Right Place – Right Price and acting as a Career Coach to our consultants.
Job Title: Senior SRE / DevOps Engineer – GenAI (L3)
Location: Menlo Park, CA – Onsite
Duration: 11 Months
Role Overview
Seeking an L3 GenAI SRE/DevOps Engineer to provide expert-level technical support, incident resolution, and platform optimization for AI/ML & LLM-powered systems.
Key Responsibilities
• Technical Support: Troubleshoot, debug, and resolve complex GenAI/LLM issues (GPT, Claude, PaLM2, Llama2, RAG pipelines).
• Platform Reliability: Maintain and optimize scalable, fault-tolerant AI/ML platforms; monitor pipelines and inference performance.
• Incident Management: Root cause analysis, ITIL-based incident/problem/service management, Jira tracking.
• Collaboration: Partner with engineering, DevOps, and product teams for feature improvements and deployment stability.
• Monitoring & Optimization: Use Splunk, AppDynamics, Grafana, Loki, Prometheus for system health and performance tuning.
• Documentation & Training: Create SOPs, training materials, and technical guides for internal and external teams.
• Continuous Improvement: Stay up-to-date on AI/ML/GenAI advancements to improve support processes.
Required Skills & Qualifications
• GenAI Expertise: GPT, Claude, PaLM2, Llama2, RAG, Agents, CoT.
• Cloud & DevOps: AWS, Azure, GCP, Kubernetes (EKS/OpenShift), Docker, Terraform, CI/CD.
• Monitoring Tools: Splunk, AppDynamics, Autosys, Grafana, Loki, Prometheus.
• Databases: SQL, Sybase, MongoDB, Snowflake, Databricks.
• Programming: Python, Shell scripting, Java, Angular, HTTP protocols.
• ITIL Processes: Incident/Problem/Service Management.
• Strong analytical, troubleshooting, and cross-team collaboration skills.
Nehal Ojha
Sr. Technical Recruiter
+1-551-363 0506
Ojha.nehal@net2source.com
www.net2source.com
270 Davidson Ave, Suite 704, Somerset, NJ 08873