STEM Sync AI

Agentic Workflow Evaluation Consultant | Remote

⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for an "Agentic Workflow Evaluation Consultant" on a W2 contract, remote for 30+ hours/week, with a pay rate up to $1,920. Requires a PhD or current/retired professor in STEM or quantitative fields, with strong Python skills and model evaluation experience.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
880
-
🗓️ - Date
May 28, 2026
🕒 - Duration
Unknown
-
🏝️ - Location
Remote
-
📄 - Contract
W2 Contractor
-
🔒 - Security
Unknown
-
📍 - Location detailed
United States
-
🧠 - Skills detailed
#Compliance #Python #Data Science #Model Evaluation #Mathematics #Statistics #GitHub #ML (Machine Learning)
Role description
Frontier Model Evaluator (Academic & Domain Expert) Remote | W2 Contract | Up to $1,920 Referral Bonus | 30+ hrs/week Quick Snapshot • Embedded within a leading frontier-model lab's GenAI team, working directly on benchmark design and model evaluation for cutting-edge LLM development • Design and validate real-world, domain-specific agentic tasks with executable Python test suites to surface reasoning and problem-solving failures in target models • Analyze model and agent behavior to classify failure types distinguishing logical reasoning gaps from other performance issues • Open to professors, retired academics, and PhD candidates across STEM, finance, law, economics, business, and quantitative disciplines • W2 employment through an established enterprise staffing partner structured role with payroll, benefits, and compliance support • Minimum 30 hours/week commitment during weekdays; work is remote and task-driven, suited to researchers with flexible schedules • Referral program available earn up to $1,920 per successful referral with no cap on referrals Requirements • Current or retired professor, or PhD student (or candidate) in a STEM field (ML, CS, mathematics, physics, engineering, statistics, biology, chemistry, data science) or quantitative/professional domain (finance, economics, law, accounting, business) • Degree or PhD in progress from a top-tier university in your field • Hands-on Python proficiency demonstrated through research, industry work, GitHub projects, or coursework; theoretical familiarity alone does not qualify • Ability to design rigorous, real-world domain problems targeting specific capability gaps in large language models or agentic systems • Build complete task specifications including golden solutions and executable test cases within an agentic development environment • Evaluate model outputs systematically and classify failure modes with precision • Prior experience in model evaluation, data annotation, or LLM/agent training is a strong plus Easy apply to proceed.