

Russell Tobin
AI EVAL Engineering
⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for an AI EVAL Engineer, offering a remote contract in Bellevue, WA, at $46/hr. Key skills include Azure OpenAI, LLMs, evaluation methodologies, Python, and experience with evaluation tools. Strong understanding of AI safety and benchmarking required.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
368
-
🗓️ - Date
December 16, 2025
🕒 - Duration
Unknown
-
🏝️ - Location
Remote
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
Bellevue, WA
-
🧠 - Skills detailed
#Azure #Batch #Programming #API (Application Programming Interface) #Data Analysis #AI (Artificial Intelligence) #Datasets #Python #Security #Automation
Role description
Position- AI EVAL Engineering
Location- Bellevue, WA (Remote)
Rate- $46/hr
Job descriptions:
Required Skills- Azure OpenAI; EVAL; Bench Marking
- Strong understanding of LLMs and generative AI concepts, including model behavior and output evaluation
- Experience with AI evaluation and benchmarking methodologies, including baseline creation and model comparison
- Hands-on expertise in Eval testing, creating structured test suites to measure accuracy, relevance, safety, and performance
- Ability to define and apply evaluation metrics (precisionrecall, BLEUROUGE, F1, hallucination rate, latency, cost per output)Prompt engineering and prompt testing experience across zero-shot, few-shot, and system prompt scenarios
- Python other programming languages, for automation, data analysis, batch evaluation execution, and API integration
- Experience with evaluation tools/frameworks (OpenAI Evals, HuggingFace evals, Promptfoo, Ragas, DeepEval, LM Eval Harness)
- Ability to create datasets, test cases, benchmarks, and ground truth references for consistent scoring
- Test design and test automation experience, including reproducible evaluation pipelines
- Knowledge of AI safety, bias, security testing, and hallucination analysis
Nice-to-Have
- RAG evaluation experience
- Azure OpenAI
- OpenAI
- Anthropic
- Google AI platforms
- Performance benchmarking (speed, throughput, cost)
- Domain knowledge Office apps enterprise systems networking
Position- AI EVAL Engineering
Location- Bellevue, WA (Remote)
Rate- $46/hr
Job descriptions:
Required Skills- Azure OpenAI; EVAL; Bench Marking
- Strong understanding of LLMs and generative AI concepts, including model behavior and output evaluation
- Experience with AI evaluation and benchmarking methodologies, including baseline creation and model comparison
- Hands-on expertise in Eval testing, creating structured test suites to measure accuracy, relevance, safety, and performance
- Ability to define and apply evaluation metrics (precisionrecall, BLEUROUGE, F1, hallucination rate, latency, cost per output)Prompt engineering and prompt testing experience across zero-shot, few-shot, and system prompt scenarios
- Python other programming languages, for automation, data analysis, batch evaluation execution, and API integration
- Experience with evaluation tools/frameworks (OpenAI Evals, HuggingFace evals, Promptfoo, Ragas, DeepEval, LM Eval Harness)
- Ability to create datasets, test cases, benchmarks, and ground truth references for consistent scoring
- Test design and test automation experience, including reproducible evaluation pipelines
- Knowledge of AI safety, bias, security testing, and hallucination analysis
Nice-to-Have
- RAG evaluation experience
- Azure OpenAI
- OpenAI
- Anthropic
- Google AI platforms
- Performance benchmarking (speed, throughput, cost)
- Domain knowledge Office apps enterprise systems networking






