Alignerr

Python Insfrastructure Engineer - Model Evaluation

⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Senior Python Full-Stack Engineer focusing on AI Data & Infrastructure, offering a remote contract of 20–40 hours/week at a competitive hourly rate. Candidates should have 3-5+ years of Python experience and ML model evaluation expertise.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
600
-
🗓️ - Date
December 18, 2025
🕒 - Duration
Unknown
-
🏝️ - Location
Remote
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
Los Angeles, CA
-
🧠 - Skills detailed
#Scala #Data Pipeline #ML (Machine Learning) #Observability #Model Evaluation #AI (Artificial Intelligence) #Programming #Data Quality #Python
Role description
About The Job Alignerr connects top technical experts with leading AI labs to build, evaluate, and improve next-generation models. We work on real production systems and high-impact research workflows across data, tooling, and infrastructure. Position Senior Python Full-Stack Engineer — AI Data & Infrastructure Type: Contract, Remote Commitment: 20–40 hours/week Compensation: Competitive, hourly (based on experience) Role Responsibilities • Design, build, and optimize high-performance systems in Python supporting AI data pipelines and evaluation workflows • Develop full-stack tooling and backend services for large-scale data annotation, validation, and quality control • Improve reliability, performance, and safety across existing Python codebases • Collaborate with data, research, and engineering teams to support model training and evaluation workflows • Identify bottlenecks and edge cases in data and system behavior, and implement scalable fixes • Participate in synchronous reviews to iterate on system design and implementation decisions Qualifications Must-Have • Native or fluent English speaker • Full-stack developer experience with a strong systems programming background • 3-5+ years of professional experience writing production Python. • Experience building evaluation harnesses for ML models, integrating with inference frameworks. • Strong background in observability and metrics collection to monitor system reliability and model performance. • Clear written and verbal communication skills. • Ability to commit 20–40 hours per week. Preferred • Prior experience with data annotation, data quality, or evaluation systems • Familiarity with AI/ML workflows, model training, or benchmarking pipelines • Experience with distributed systems or developer tooling Application Process • Submit your resume • Complete a short technical screening • Project matching and onboarding