STEM Sync AI

Agentic Workflow Evaluation Consultant | Remote

⭐ - Featured Role | Apply direct with Data Freelance Hub

This role is for an "Agentic Workflow Evaluation Consultant" on a W2 contract, remote for 30+ hours/week, with a pay rate up to $1,920. Requires a PhD or current/retired professor in STEM or quantitative fields, with strong Python skills and model evaluation experience.

🌎 - Country

United States

💱 - Currency

$ USD

💰 - Day rate

880

🗓️ - Date

May 28, 2026

🕒 - Duration

Unknown

🏝️ - Location

Remote

📄 - Contract

W2 Contractor

🔒 - Security

Unknown

📍 - Location detailed

United States

🧠 - Skills detailed

#Compliance #Python #Data Science #Model Evaluation #Mathematics #Statistics #GitHub #ML (Machine Learning)

Role description

Frontier Model Evaluator (Academic & Domain Expert) Remote | W2 Contract | Up to $1,920 Referral Bonus | 30+ hrs/week Quick Snapshot • Embedded within a leading frontier-model lab's GenAI team, working directly on benchmark design and model evaluation for cutting-edge LLM development • Design and validate real-world, domain-specific agentic tasks with executable Python test suites to surface reasoning and problem-solving failures in target models • Analyze model and agent behavior to classify failure types distinguishing logical reasoning gaps from other performance issues • Open to professors, retired academics, and PhD candidates across STEM, finance, law, economics, business, and quantitative disciplines • W2 employment through an established enterprise staffing partner structured role with payroll, benefits, and compliance support • Minimum 30 hours/week commitment during weekdays; work is remote and task-driven, suited to researchers with flexible schedules • Referral program available earn up to $1,920 per successful referral with no cap on referrals Requirements • Current or retired professor, or PhD student (or candidate) in a STEM field (ML, CS, mathematics, physics, engineering, statistics, biology, chemistry, data science) or quantitative/professional domain (finance, economics, law, accounting, business) • Degree or PhD in progress from a top-tier university in your field • Hands-on Python proficiency demonstrated through research, industry work, GitHub projects, or coursework; theoretical familiarity alone does not qualify • Ability to design rigorous, real-world domain problems targeting specific capability gaps in large language models or agentic systems • Build complete task specifications including golden solutions and executable test cases within an agentic development environment • Evaluate model outputs systematically and classify failure modes with precision • Prior experience in model evaluation, data annotation, or LLM/agent training is a strong plus Easy apply to proceed.

Apply now Apply with DFH

STEM Sync AI

Agentic Workflow Evaluation Consultant | Remote

Data Solutions Analyst

Technical Lead – Cloud Data Engineering

Machine Learning Engineer

Senior Microsoft Purview Engineer

Book a

chat

with us

Company