ApexsyncTechnologies

Technical Lead – AI Operations & Support

⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Technical Lead – AI Operations & Support on a one-year remote contract, requiring quarterly on-site presence in Dallas, TX. Key skills include deep AWS expertise, AIOps experience, and proficiency in generative AI systems. AWS certification is a plus.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
Unknown
-
🗓️ - Date
April 15, 2026
🕒 - Duration
More than 6 months
-
🏝️ - Location
Remote
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
Dallas, TX
-
🧠 - Skills detailed
#Data Engineering #Scala #Observability #Monitoring #Langchain #Storage #Alation #AI (Artificial Intelligence) #Leadership #AWS (Amazon Web Services) #ML (Machine Learning) #Deployment #IAM (Identity and Access Management) #Data Science #Cloud #Data Pipeline
Role description
Role: Technical Lead – AI Operations & Support Location: Remote (Quarterly on-site requirement in Dallas, TX) Duration: One-year contract, open-ended C2C Previous experience working directly for Amazon/AWS is a plus Final interview will be with the Sr. Data Scientist, Sr. Data Engineer and Sr. Software Engineer on the team Overview We are seeking a Technical Lead to oversee our AI Operations (AIOps) and AI Support function. This individual will serve as a senior technical liaison between architecture teams and internal engineering groups, providing strategic thought leadership, hands-on technical guidance, and operational oversight for AI systems in production. The Technical Lead will be responsible for ensuring the reliability, performance, and observability of machine learning, generative AI, and agentic systems. This includes monitoring model health, detecting drift, assessing groundedness and hallucinations, and ensuring production-ready operational standards across AI platforms. Key Responsibilities • Act as the technical authority for AI Operations and Support, overseeing monitoring, alerting, and incident response for AI systems in production • Partner closely with architecture and engineering teams to guide AI systems from development through production deployment and ongoing operational support • Lead model monitoring efforts, including detection of data drift, performance degradation, hallucinations, and groundedness issues • Support both traditional ML models and Generative AI solutions, ensuring best practices for observability and operational readiness • Monitor and support generative AI applications such as chatbots, RAG/grounded solutions, and internally developed AI products • Evaluate and support the rollout of Amazon Quick (formerly Quick Suite) to enable self-service development through a digital workspace • Lead operational readiness for the launch and scale of AWS AgentCore, including monitoring agent task performance, success rates, latency, reasoning steps, and semantic evaluations • Oversee a growing portfolio of production AI models, each with established alerting and support processes • Serve as an escalation point for production issues, with a strong focus on diagnosing data pipeline and data source failures • Provide technical leadership over a dedicated support team handling day-to-day ticketing and incidents • Communicate effectively with both engineering teams and executive leadership, clearly articulating system health, incidents, root causes, and remediation plans Technical Environment • Cloud-native architecture built on AWS • Heavy usage of AWS infrastructure services, CloudWatch, and AWS AI platforms • Generative AI frameworks including LangChain (primary) and limited use of CrewAI • AI services such as Amazon Bedrock, Amazon Quick, and AWS AgentCore Requirements • Deep AWS experience and architectural expertise, including core compute, storage, networking, IAM, CloudWatch, and AI services (Bedrock, Quick/Quick Suite; AgentCore is a plus) • Hands-on AIOps and production support experience across ML, Generative AI, and agent-based systems • Strong understanding of model monitoring fundamentals, including data drift, performance degradation, alerting strategies, hallucination detection, and agent performance metrics • Experience supporting Generative AI systems in production, including chatbots and RAG/grounded architectures • Expertise in agent and workflow monitoring, including task success rates, latency, reasoning steps, and semantic evaluation • Proven ability to provide technical leadership, mentor teams, and drive operational best practices • Excellent communication skills, with the ability to engage both deeply technical stakeholders and executive leadership Nice to Have • AWS certification(s) • Early experience with AWS AgentCore