DeWinter Group

AI Safety & Evaluation Engineer

⭐ - Featured Role | Apply direct with Data Freelance Hub

This role is for an AI Safety & Evaluation Engineer on a 12-month contract, offering $50/hr – $175/hr, remote work. Requires 3+ years in AI Research or Quality Engineering, expertise in model evaluation techniques, and proficiency in Python and NLP metrics.

🌎 - Country

United States

💱 - Currency

$ USD

💰 - Day rate

400

🗓️ - Date

May 6, 2026

🕒 - Duration

More than 6 months

🏝️ - Location

Remote

📄 - Contract

Unknown

🔒 - Security

Unknown

📍 - Location detailed

Campbell, CA

🧠 - Skills detailed

#Monitoring #NLP (Natural Language Processing) #Python #AI (Artificial Intelligence) #Datasets #Data Analysis #Deployment #Model Evaluation

Role description

Title: AI Safety and Evaluations Engineer Job Type: Contract Contract Length: 12 Months Pay Range: $50/hr – $175/hr Start Date: ASAP Location: Remote About The Opportunity Our client, a leader in AI testing and Generative AI solutions, is looking for a skilled AI Safety and Evaluations Engineer to join their team for a 12-month engagement. This project involves designing and building rigorous evaluation frameworks to measure model bias, hallucinations, and toxicity, ensuring models are safe and compliant before deployment. This is a high-impact role that requires a self-motivated professional who can hit the ground running and deliver results quickly. Key Responsibilities & Deliverables This role is focused on the successful completion of specific tasks and deliverables. Your responsibilities will include: • Designing and building rigorous evaluation frameworks to measure model bias, hallucinations, and toxicity. • Creating automated "Eval" datasets to benchmark new models before they are promoted to production. • Developing metrics for "Grounding" and "Faithfulness" in RAG-based systems. • Building monitoring tools that flag harmful or non-compliant AI outputs in real-time. • Partnering with legal and ethics teams to translate policy into technical safety constraints. Required Skills & Experience: We are looking for someone with a proven track record of successful contract engagements. The ideal candidate will have: • 3+ years of experience in AI Research or Quality Engineering. • Deep expertise in model evaluation techniques and NLP metrics (ROUGE, BLEU, BERTScore). This isn't a learning role—you need to be a subject matter expert. • Demonstrated ability to work autonomously and manage your own time effectively to meet project goals. • Experience with Python, data analysis tools, and LLM-as-a-Judge frameworks. • Strong communication skills to provide clear and concise status updates to the project team.

Apply now Apply with DFH

← See all roles

Go to role

Meridial Marketplace, by Invisible

is hiring for a:

DeWinter Group

AI Safety & Evaluation Engineer

SWE Infrastructure Specialist (Java) – Freelance AI Trainer Project

Azure Data Engineer - SC Cleared

Senior AI Engineer

Marketplace Platform Lead/Lead Data Marketplace

Book a

chat

with us

Company