Teamware Solutions

LLM Evaluation Engineer

⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for an LLM Evaluation Engineer, remote for 12+ months, offering competitive pay. Key skills include LLMs, AI evaluation methodologies, Python, and experience with evaluation tools. Strong understanding of AI safety and bias testing is essential.
🌎 - Country
United States
💱 - Currency
€ EUR
-
💰 - Day rate
Unknown
-
🗓️ - Date
December 17, 2025
🕒 - Duration
More than 6 months
-
🏝️ - Location
Remote
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
United States
-
🧠 - Skills detailed
#Data Analysis #AI (Artificial Intelligence) #API (Application Programming Interface) #Programming #Automation #Security #Python #Datasets #Batch
Role description
LLM Evaluation Engineer Location: Remote Duration: 12+ Months Required Skills • Strong understanding of LLMs and generative AI concepts, including model behavior and output evaluation • Experience with AI evaluation and benchmarking methodologies, including baseline creation and model comparison • Hands-on expertise in Eval testing, creating structured test suites to measure accuracy, relevance, safety, and performance • Ability to define and apply evaluation metrics (precisionrecall, BLEUROUGE, F1, hallucination rate, latency, cost per output)Prompt engineering and prompt testing experience across zero-shot, few-shot, and system prompt scenarios • Python other programming languages, for automation, data analysis, batch evaluation execution, and API integration • Experience with evaluation tools/frameworks (OpenAI Evals, HuggingFace evals, Promptfoo, Ragas, DeepEval, LM Eval Harness) • Ability to create datasets, test cases, benchmarks, and ground truth references for consistent scoring • Test design and test automation experience, including reproducible evaluation pipelines • Knowledge of AI safety, bias, security testing, and hallucination analysis