

Tential Solutions
AI/LLM Engineer
⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Senior SDET – AI/LLM Engineer, offering a remote contract position with a competitive pay rate. Requires 5+ years of SDET experience, strong Python skills, and expertise in testing ML/LLM systems. Familiarity with tools like LangChain and MLflow is essential.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
Unknown
-
🗓️ - Date
December 16, 2025
🕒 - Duration
Unknown
-
🏝️ - Location
Remote
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
Tampa, FL
-
🧠 - Skills detailed
#Data Quality #Data Engineering #DevOps #Monitoring #MLflow #ML (Machine Learning) #Compliance #Regression #Observability #Code Reviews #Datadog #Langchain #Agile #Datasets #Python #Hugging Face #Automation #TensorFlow #AI (Artificial Intelligence) #PyTorch #Infrastructure as Code (IaC)
Role description
Senior SDET – AI / LLM Quality Engineering (Shared Services)
About The Team
This role sits within the QA Center of Excellence, as part of a small, highly specialized AI Quality Engineering team consisting of two SDETs and one Data Engineer.
The team operates as a shared service across the organization, defining how Large Language Model (LLM)–powered systems are tested, evaluated, observed, and trusted before and after production release.
Rather than building customer-facing AI features, this team builds LLM-based testing and evaluation frameworks and partners with product, platform, and data teams to ensure generative AI solutions meet quality, reliability, and compliance standards.
Role Overview
We are seeking a Senior Software Development Engineer in Test (SDET) with a strong automation and systems-testing background to focus on LLM quality, validation, and evaluation.
In This Role, You Will
• Test LLM-powered applications used across the enterprise
• Build LLM-driven testing and evaluation workflows
• Define organization-wide standards for GenAI quality and reliability
This is a hands-on engineering role with significant influence across teams.
Key Responsibilities
LLM Testing & Evaluation
• Design and implement test strategies for LLM-powered systems, including:
• Prompt and response validation
• Regression testing across model, prompt, and data changes
• Evaluation of accuracy, consistency, hallucinations, and safety
• Build and maintain LLM-based evaluation frameworks using tools such as DeepEval, MLflow, Langflow, and LangChain
• Develop synthetic and real-world test datasets in partnership with the Data Engineer
• Define quality thresholds, scoring mechanisms, and pass/fail criteria for GenAI systems
Test Automation & Framework Development
• Build and maintain automated test frameworks for:
• LLM APIs and services
• Agentic and RAG workflows
• Data and inference pipelines
• Integrate testing and evaluation into CI/CD pipelines, enforcing quality gates before production release
• Partner with engineering teams to improve testability and reliability of AI systems
• Perform root-cause analysis of failures related to model behavior, data quality, or orchestration logic
Observability & Monitoring
• Instrument LLM applications with Datadog LLM Observability to monitor:
• Latency, token usage, errors, and cost
• Quality regressions and performance anomalies
• Build dashboards and alerts focused on LLM quality, reliability, and drift
• Use production telemetry to continuously refine test coverage and evaluation strategies
Shared Services & Collaboration
• Act as a consultative partner to product, platform, and data teams adopting LLM technologies
• Provide guidance on:
• Test strategies for generative AI
• Prompt and workflow validation
• Release readiness and risk assessment
• Contribute to organization-wide standards and best practices for explaining, testing, and monitoring AI systems
• Participate in design and architecture reviews from a quality-first perspective
Engineering Excellence
• Advocate for automation-first testing, infrastructure as code, and continuous monitoring
• Drive adoption of Agile, DevOps, and CI/CD best practices within the AI quality space
• Conduct code reviews and promote secure, maintainable test frameworks
• Continuously improve internal tooling and frameworks used by the QA Center of Excellence
Required Skills & Experience
Core SDET Experience
• 5+ years of experience in SDET, test automation, or quality engineering roles
• Strong Python development skills
• Experience testing backend systems, APIs, or distributed platforms
• Proven experience building and maintaining automation frameworks
• Comfort working with ambiguous, non-deterministic systems
AI / LLM Experience
• Hands-on experience testing or validating ML- or LLM-based systems
• Familiarity with LLM orchestration and evaluation tools such as:
• Langflow, LangChain
• DeepEval, MLflow
• Understanding of challenges unique to testing generative AI systems
Nice to Have
• Experience with Datadog (especially LLM Observability)
• Exposure to Hugging Face, PyTorch, or TensorFlow (usage-level)
• Experience testing RAG pipelines, VectorDBs, or data-driven platforms
• Background working in platform, shared services, or Center of Excellence teams
• Experience collaborating closely with data engineering or ML platform teams
What This Role Is Not
• ? Not a pure ML research or model training role
• ? Not a feature-focused backend engineering role
• ? Not manual QA
Why This Role Is Unique
• You will define how AI quality is measured across the organization
• You will build LLM-powered testing systems, not just test scripts
• You will influence multiple teams and products, not just one codebase
• You will work at the intersection of AI, automation, and reliability
#Remote
Senior SDET – AI / LLM Quality Engineering (Shared Services)
About The Team
This role sits within the QA Center of Excellence, as part of a small, highly specialized AI Quality Engineering team consisting of two SDETs and one Data Engineer.
The team operates as a shared service across the organization, defining how Large Language Model (LLM)–powered systems are tested, evaluated, observed, and trusted before and after production release.
Rather than building customer-facing AI features, this team builds LLM-based testing and evaluation frameworks and partners with product, platform, and data teams to ensure generative AI solutions meet quality, reliability, and compliance standards.
Role Overview
We are seeking a Senior Software Development Engineer in Test (SDET) with a strong automation and systems-testing background to focus on LLM quality, validation, and evaluation.
In This Role, You Will
• Test LLM-powered applications used across the enterprise
• Build LLM-driven testing and evaluation workflows
• Define organization-wide standards for GenAI quality and reliability
This is a hands-on engineering role with significant influence across teams.
Key Responsibilities
LLM Testing & Evaluation
• Design and implement test strategies for LLM-powered systems, including:
• Prompt and response validation
• Regression testing across model, prompt, and data changes
• Evaluation of accuracy, consistency, hallucinations, and safety
• Build and maintain LLM-based evaluation frameworks using tools such as DeepEval, MLflow, Langflow, and LangChain
• Develop synthetic and real-world test datasets in partnership with the Data Engineer
• Define quality thresholds, scoring mechanisms, and pass/fail criteria for GenAI systems
Test Automation & Framework Development
• Build and maintain automated test frameworks for:
• LLM APIs and services
• Agentic and RAG workflows
• Data and inference pipelines
• Integrate testing and evaluation into CI/CD pipelines, enforcing quality gates before production release
• Partner with engineering teams to improve testability and reliability of AI systems
• Perform root-cause analysis of failures related to model behavior, data quality, or orchestration logic
Observability & Monitoring
• Instrument LLM applications with Datadog LLM Observability to monitor:
• Latency, token usage, errors, and cost
• Quality regressions and performance anomalies
• Build dashboards and alerts focused on LLM quality, reliability, and drift
• Use production telemetry to continuously refine test coverage and evaluation strategies
Shared Services & Collaboration
• Act as a consultative partner to product, platform, and data teams adopting LLM technologies
• Provide guidance on:
• Test strategies for generative AI
• Prompt and workflow validation
• Release readiness and risk assessment
• Contribute to organization-wide standards and best practices for explaining, testing, and monitoring AI systems
• Participate in design and architecture reviews from a quality-first perspective
Engineering Excellence
• Advocate for automation-first testing, infrastructure as code, and continuous monitoring
• Drive adoption of Agile, DevOps, and CI/CD best practices within the AI quality space
• Conduct code reviews and promote secure, maintainable test frameworks
• Continuously improve internal tooling and frameworks used by the QA Center of Excellence
Required Skills & Experience
Core SDET Experience
• 5+ years of experience in SDET, test automation, or quality engineering roles
• Strong Python development skills
• Experience testing backend systems, APIs, or distributed platforms
• Proven experience building and maintaining automation frameworks
• Comfort working with ambiguous, non-deterministic systems
AI / LLM Experience
• Hands-on experience testing or validating ML- or LLM-based systems
• Familiarity with LLM orchestration and evaluation tools such as:
• Langflow, LangChain
• DeepEval, MLflow
• Understanding of challenges unique to testing generative AI systems
Nice to Have
• Experience with Datadog (especially LLM Observability)
• Exposure to Hugging Face, PyTorch, or TensorFlow (usage-level)
• Experience testing RAG pipelines, VectorDBs, or data-driven platforms
• Background working in platform, shared services, or Center of Excellence teams
• Experience collaborating closely with data engineering or ML platform teams
What This Role Is Not
• ? Not a pure ML research or model training role
• ? Not a feature-focused backend engineering role
• ? Not manual QA
Why This Role Is Unique
• You will define how AI quality is measured across the organization
• You will build LLM-powered testing systems, not just test scripts
• You will influence multiple teams and products, not just one codebase
• You will work at the intersection of AI, automation, and reliability
#Remote






