

DeWinter Group
AI Safety & Evaluation Engineer
⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for an AI Safety & Evaluation Engineer on a 12-month contract, offering $50/hr – $175/hr, remote work. Requires 3+ years in AI Research or Quality Engineering, expertise in model evaluation techniques, and proficiency in Python and NLP metrics.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
400
-
🗓️ - Date
May 6, 2026
🕒 - Duration
More than 6 months
-
🏝️ - Location
Remote
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
Campbell, CA
-
🧠 - Skills detailed
#Monitoring #NLP (Natural Language Processing) #Python #AI (Artificial Intelligence) #Datasets #Data Analysis #Deployment #Model Evaluation
Role description
Title: AI Safety and Evaluations Engineer
Job Type: Contract
Contract Length: 12 Months
Pay Range: $50/hr – $175/hr
Start Date: ASAP
Location: Remote
About The Opportunity
Our client, a leader in AI testing and Generative AI solutions, is looking for a skilled AI Safety and Evaluations Engineer to join their team for a 12-month engagement. This project involves designing and building rigorous evaluation frameworks to measure model bias, hallucinations, and toxicity, ensuring models are safe and compliant before deployment. This is a high-impact role that requires a self-motivated professional who can hit the ground running and deliver results quickly.
Key Responsibilities & Deliverables
This role is focused on the successful completion of specific tasks and deliverables. Your responsibilities will include:
• Designing and building rigorous evaluation frameworks to measure model bias, hallucinations, and toxicity.
• Creating automated "Eval" datasets to benchmark new models before they are promoted to production.
• Developing metrics for "Grounding" and "Faithfulness" in RAG-based systems.
• Building monitoring tools that flag harmful or non-compliant AI outputs in real-time.
• Partnering with legal and ethics teams to translate policy into technical safety constraints.
Required Skills & Experience:
We are looking for someone with a proven track record of successful contract engagements. The ideal candidate will have:
• 3+ years of experience in AI Research or Quality Engineering.
• Deep expertise in model evaluation techniques and NLP metrics (ROUGE, BLEU, BERTScore). This isn't a learning role—you need to be a subject matter expert.
• Demonstrated ability to work autonomously and manage your own time effectively to meet project goals.
• Experience with Python, data analysis tools, and LLM-as-a-Judge frameworks.
• Strong communication skills to provide clear and concise status updates to the project team.
Title: AI Safety and Evaluations Engineer
Job Type: Contract
Contract Length: 12 Months
Pay Range: $50/hr – $175/hr
Start Date: ASAP
Location: Remote
About The Opportunity
Our client, a leader in AI testing and Generative AI solutions, is looking for a skilled AI Safety and Evaluations Engineer to join their team for a 12-month engagement. This project involves designing and building rigorous evaluation frameworks to measure model bias, hallucinations, and toxicity, ensuring models are safe and compliant before deployment. This is a high-impact role that requires a self-motivated professional who can hit the ground running and deliver results quickly.
Key Responsibilities & Deliverables
This role is focused on the successful completion of specific tasks and deliverables. Your responsibilities will include:
• Designing and building rigorous evaluation frameworks to measure model bias, hallucinations, and toxicity.
• Creating automated "Eval" datasets to benchmark new models before they are promoted to production.
• Developing metrics for "Grounding" and "Faithfulness" in RAG-based systems.
• Building monitoring tools that flag harmful or non-compliant AI outputs in real-time.
• Partnering with legal and ethics teams to translate policy into technical safety constraints.
Required Skills & Experience:
We are looking for someone with a proven track record of successful contract engagements. The ideal candidate will have:
• 3+ years of experience in AI Research or Quality Engineering.
• Deep expertise in model evaluation techniques and NLP metrics (ROUGE, BLEU, BERTScore). This isn't a learning role—you need to be a subject matter expert.
• Demonstrated ability to work autonomously and manage your own time effectively to meet project goals.
• Experience with Python, data analysis tools, and LLM-as-a-Judge frameworks.
• Strong communication skills to provide clear and concise status updates to the project team.






