

Alignerr
Python Insfrastructure Engineer - Model Evaluation
⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Python Infrastructure Engineer - Model Evaluation, offering a flexible remote contract for 20–40 hours/week at an hourly pay rate. Requires 3–5+ years of Python experience, ML model evaluation expertise, and solid observability skills.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
600
-
🗓️ - Date
April 13, 2026
🕒 - Duration
Unknown
-
🏝️ - Location
Remote
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
Denver, CO
-
🧠 - Skills detailed
#Data Pipeline #Python #Scala #Monitoring #Model Evaluation #Programming #Observability #Data Quality #AI (Artificial Intelligence) #ML (Machine Learning)
Role description
Python Infrastructure Engineer — Model Evaluation (AI Training)
About The Role
What if your Python expertise could directly shape how the world's most advanced AI models are built, tested, and improved? We're looking for a senior Python engineer to design and build the data pipelines, evaluation harnesses, and annotation tooling that sit at the heart of cutting-edge AI development.
This is a fully remote, flexible contract role working alongside leading AI research labs on real production systems. If you're a strong Python engineer who wants to do meaningful, high-impact work at the frontier of AI — this is the role for you.
• Organization: Alignerr
• Type: Hourly Contract
• Location: Remote
• Commitment: 20–40 hours/week
What You'll Do
• Design, build, and optimize high-performance Python systems supporting AI data pipelines and model evaluation workflows
• Develop full-stack tooling and backend services for large-scale data annotation, validation, and quality control
• Build and maintain evaluation harnesses that integrate with ML inference frameworks
• Improve reliability, performance, and safety across existing Python codebases
• Instrument systems with observability and metrics collection to monitor reliability and model performance
• Identify bottlenecks and edge cases in data and system behavior, and implement scalable fixes
• Collaborate with data, research, and engineering teams to support model training and evaluation workflows
• Participate in synchronous design reviews to iterate on architecture and implementation decisions
Who You Are
• Native or fluent English speaker with clear written and verbal communication skills
• Full-stack developer with a strong systems programming background
• 3–5+ years of professional experience writing production-grade Python
• Experienced building evaluation harnesses for ML models and integrating with inference frameworks
• Solid background in observability, metrics collection, and monitoring for production systems
• Self-motivated and reliable — able to commit 20–40 hours per week
Nice to Have
• Prior experience with data annotation, data quality, or evaluation systems
• Familiarity with AI/ML workflows, model training, or benchmarking pipelines
• Experience with distributed systems or developer tooling
• Background in MLOps or AI infrastructure
Why Join Us
• Work directly on cutting-edge AI projects alongside leading research labs
• Fully remote and flexible — structure your work week around your life
• Freelance autonomy with the depth and consistency of meaningful, long-term technical work
• Make a tangible impact on how next-generation AI models are evaluated and improved
• Potential for ongoing work and contract extension as new projects launch
Python Infrastructure Engineer — Model Evaluation (AI Training)
About The Role
What if your Python expertise could directly shape how the world's most advanced AI models are built, tested, and improved? We're looking for a senior Python engineer to design and build the data pipelines, evaluation harnesses, and annotation tooling that sit at the heart of cutting-edge AI development.
This is a fully remote, flexible contract role working alongside leading AI research labs on real production systems. If you're a strong Python engineer who wants to do meaningful, high-impact work at the frontier of AI — this is the role for you.
• Organization: Alignerr
• Type: Hourly Contract
• Location: Remote
• Commitment: 20–40 hours/week
What You'll Do
• Design, build, and optimize high-performance Python systems supporting AI data pipelines and model evaluation workflows
• Develop full-stack tooling and backend services for large-scale data annotation, validation, and quality control
• Build and maintain evaluation harnesses that integrate with ML inference frameworks
• Improve reliability, performance, and safety across existing Python codebases
• Instrument systems with observability and metrics collection to monitor reliability and model performance
• Identify bottlenecks and edge cases in data and system behavior, and implement scalable fixes
• Collaborate with data, research, and engineering teams to support model training and evaluation workflows
• Participate in synchronous design reviews to iterate on architecture and implementation decisions
Who You Are
• Native or fluent English speaker with clear written and verbal communication skills
• Full-stack developer with a strong systems programming background
• 3–5+ years of professional experience writing production-grade Python
• Experienced building evaluation harnesses for ML models and integrating with inference frameworks
• Solid background in observability, metrics collection, and monitoring for production systems
• Self-motivated and reliable — able to commit 20–40 hours per week
Nice to Have
• Prior experience with data annotation, data quality, or evaluation systems
• Familiarity with AI/ML workflows, model training, or benchmarking pipelines
• Experience with distributed systems or developer tooling
• Background in MLOps or AI infrastructure
Why Join Us
• Work directly on cutting-edge AI projects alongside leading research labs
• Fully remote and flexible — structure your work week around your life
• Freelance autonomy with the depth and consistency of meaningful, long-term technical work
• Make a tangible impact on how next-generation AI models are evaluated and improved
• Potential for ongoing work and contract extension as new projects launch


