Russell Tobin

AI EVAL Engineering

⭐ - Featured Role | Apply direct with Data Freelance Hub

This role is for an AI EVAL Engineer, offering a remote contract in Bellevue, WA, at $46/hr. Key skills include Azure OpenAI, LLMs, evaluation methodologies, Python, and experience with evaluation tools. Strong understanding of AI safety and benchmarking required.

🌎 - Country

United States

💱 - Currency

$ USD

💰 - Day rate

368

🗓️ - Date

December 16, 2025

🕒 - Duration

Unknown

🏝️ - Location

Remote

📄 - Contract

Unknown

🔒 - Security

Unknown

📍 - Location detailed

Bellevue, WA

🧠 - Skills detailed

#Azure #Batch #Programming #API (Application Programming Interface) #Data Analysis #AI (Artificial Intelligence) #Datasets #Python #Security #Automation

Role description

Position- AI EVAL Engineering Location- Bellevue, WA (Remote) Rate- $46/hr Job descriptions: Required Skills- Azure OpenAI; EVAL; Bench Marking - Strong understanding of LLMs and generative AI concepts, including model behavior and output evaluation - Experience with AI evaluation and benchmarking methodologies, including baseline creation and model comparison - Hands-on expertise in Eval testing, creating structured test suites to measure accuracy, relevance, safety, and performance - Ability to define and apply evaluation metrics (precisionrecall, BLEUROUGE, F1, hallucination rate, latency, cost per output)Prompt engineering and prompt testing experience across zero-shot, few-shot, and system prompt scenarios - Python other programming languages, for automation, data analysis, batch evaluation execution, and API integration - Experience with evaluation tools/frameworks (OpenAI Evals, HuggingFace evals, Promptfoo, Ragas, DeepEval, LM Eval Harness) - Ability to create datasets, test cases, benchmarks, and ground truth references for consistent scoring - Test design and test automation experience, including reproducible evaluation pipelines - Knowledge of AI safety, bias, security testing, and hallucination analysis Nice-to-Have - RAG evaluation experience - Azure OpenAI - OpenAI - Anthropic - Google AI platforms - Performance benchmarking (speed, throughput, cost) - Domain knowledge Office apps enterprise systems networking

Apply now Apply with DFH Sign up

Russell Tobin

AI EVAL Engineering

Business Intelligence Analyst

AI/LLM Engineer

OpenText Business Analyst

Sr. Software Business Analyst

Book a

chat

with us

Company