Tential Solutions

AI/LLM Engineer

⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Senior SDET – AI/LLM Engineer, offering a remote contract position with a competitive pay rate. Requires 5+ years of SDET experience, strong Python skills, and expertise in testing ML/LLM systems. Familiarity with tools like LangChain and MLflow is essential.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
Unknown
-
🗓️ - Date
December 16, 2025
🕒 - Duration
Unknown
-
🏝️ - Location
Remote
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
Tampa, FL
-
🧠 - Skills detailed
#Data Quality #Data Engineering #DevOps #Monitoring #MLflow #ML (Machine Learning) #Compliance #Regression #Observability #Code Reviews #Datadog #Langchain #Agile #Datasets #Python #Hugging Face #Automation #TensorFlow #AI (Artificial Intelligence) #PyTorch #Infrastructure as Code (IaC)
Role description
Senior SDET – AI / LLM Quality Engineering (Shared Services) About The Team This role sits within the QA Center of Excellence, as part of a small, highly specialized AI Quality Engineering team consisting of two SDETs and one Data Engineer. The team operates as a shared service across the organization, defining how Large Language Model (LLM)–powered systems are tested, evaluated, observed, and trusted before and after production release. Rather than building customer-facing AI features, this team builds LLM-based testing and evaluation frameworks and partners with product, platform, and data teams to ensure generative AI solutions meet quality, reliability, and compliance standards. Role Overview We are seeking a Senior Software Development Engineer in Test (SDET) with a strong automation and systems-testing background to focus on LLM quality, validation, and evaluation. In This Role, You Will • Test LLM-powered applications used across the enterprise • Build LLM-driven testing and evaluation workflows • Define organization-wide standards for GenAI quality and reliability This is a hands-on engineering role with significant influence across teams. Key Responsibilities LLM Testing & Evaluation • Design and implement test strategies for LLM-powered systems, including: • Prompt and response validation • Regression testing across model, prompt, and data changes • Evaluation of accuracy, consistency, hallucinations, and safety • Build and maintain LLM-based evaluation frameworks using tools such as DeepEval, MLflow, Langflow, and LangChain • Develop synthetic and real-world test datasets in partnership with the Data Engineer • Define quality thresholds, scoring mechanisms, and pass/fail criteria for GenAI systems Test Automation & Framework Development • Build and maintain automated test frameworks for: • LLM APIs and services • Agentic and RAG workflows • Data and inference pipelines • Integrate testing and evaluation into CI/CD pipelines, enforcing quality gates before production release • Partner with engineering teams to improve testability and reliability of AI systems • Perform root-cause analysis of failures related to model behavior, data quality, or orchestration logic Observability & Monitoring • Instrument LLM applications with Datadog LLM Observability to monitor: • Latency, token usage, errors, and cost • Quality regressions and performance anomalies • Build dashboards and alerts focused on LLM quality, reliability, and drift • Use production telemetry to continuously refine test coverage and evaluation strategies Shared Services & Collaboration • Act as a consultative partner to product, platform, and data teams adopting LLM technologies • Provide guidance on: • Test strategies for generative AI • Prompt and workflow validation • Release readiness and risk assessment • Contribute to organization-wide standards and best practices for explaining, testing, and monitoring AI systems • Participate in design and architecture reviews from a quality-first perspective Engineering Excellence • Advocate for automation-first testing, infrastructure as code, and continuous monitoring • Drive adoption of Agile, DevOps, and CI/CD best practices within the AI quality space • Conduct code reviews and promote secure, maintainable test frameworks • Continuously improve internal tooling and frameworks used by the QA Center of Excellence Required Skills & Experience Core SDET Experience • 5+ years of experience in SDET, test automation, or quality engineering roles • Strong Python development skills • Experience testing backend systems, APIs, or distributed platforms • Proven experience building and maintaining automation frameworks • Comfort working with ambiguous, non-deterministic systems AI / LLM Experience • Hands-on experience testing or validating ML- or LLM-based systems • Familiarity with LLM orchestration and evaluation tools such as: • Langflow, LangChain • DeepEval, MLflow • Understanding of challenges unique to testing generative AI systems Nice to Have • Experience with Datadog (especially LLM Observability) • Exposure to Hugging Face, PyTorch, or TensorFlow (usage-level) • Experience testing RAG pipelines, VectorDBs, or data-driven platforms • Background working in platform, shared services, or Center of Excellence teams • Experience collaborating closely with data engineering or ML platform teams What This Role Is Not • ? Not a pure ML research or model training role • ? Not a feature-focused backend engineering role • ? Not manual QA Why This Role Is Unique • You will define how AI quality is measured across the organization • You will build LLM-powered testing systems, not just test scripts • You will influence multiple teams and products, not just one codebase • You will work at the intersection of AI, automation, and reliability #Remote