Alignerr

Python Insfrastructure Engineer - Model Evaluation

⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Python Infrastructure Engineer - Model Evaluation, offering a flexible remote contract for 20–40 hours/week at an hourly pay rate. Requires 3–5+ years of Python experience, ML model evaluation expertise, and solid observability skills.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
600
-
🗓️ - Date
April 13, 2026
🕒 - Duration
Unknown
-
🏝️ - Location
Remote
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
Denver, CO
-
🧠 - Skills detailed
#Data Pipeline #Python #Scala #Monitoring #Model Evaluation #Programming #Observability #Data Quality #AI (Artificial Intelligence) #ML (Machine Learning)
Role description
Python Infrastructure Engineer — Model Evaluation (AI Training) About The Role What if your Python expertise could directly shape how the world's most advanced AI models are built, tested, and improved? We're looking for a senior Python engineer to design and build the data pipelines, evaluation harnesses, and annotation tooling that sit at the heart of cutting-edge AI development. This is a fully remote, flexible contract role working alongside leading AI research labs on real production systems. If you're a strong Python engineer who wants to do meaningful, high-impact work at the frontier of AI — this is the role for you. • Organization: Alignerr • Type: Hourly Contract • Location: Remote • Commitment: 20–40 hours/week What You'll Do • Design, build, and optimize high-performance Python systems supporting AI data pipelines and model evaluation workflows • Develop full-stack tooling and backend services for large-scale data annotation, validation, and quality control • Build and maintain evaluation harnesses that integrate with ML inference frameworks • Improve reliability, performance, and safety across existing Python codebases • Instrument systems with observability and metrics collection to monitor reliability and model performance • Identify bottlenecks and edge cases in data and system behavior, and implement scalable fixes • Collaborate with data, research, and engineering teams to support model training and evaluation workflows • Participate in synchronous design reviews to iterate on architecture and implementation decisions Who You Are • Native or fluent English speaker with clear written and verbal communication skills • Full-stack developer with a strong systems programming background • 3–5+ years of professional experience writing production-grade Python • Experienced building evaluation harnesses for ML models and integrating with inference frameworks • Solid background in observability, metrics collection, and monitoring for production systems • Self-motivated and reliable — able to commit 20–40 hours per week Nice to Have • Prior experience with data annotation, data quality, or evaluation systems • Familiarity with AI/ML workflows, model training, or benchmarking pipelines • Experience with distributed systems or developer tooling • Background in MLOps or AI infrastructure Why Join Us • Work directly on cutting-edge AI projects alongside leading research labs • Fully remote and flexible — structure your work week around your life • Freelance autonomy with the depth and consistency of meaningful, long-term technical work • Make a tangible impact on how next-generation AI models are evaluated and improved • Potential for ongoing work and contract extension as new projects launch