Kuddo Health

Machine Learning Engineer

⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Machine Learning Engineer on a 3-month contract at $60/hr, with potential full-time conversion. Key skills include ML/DL experience, data pipeline management, and familiarity with healthcare AI. Remote work is acceptable; New York or SF Bay Area preferred.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
480
-
🗓️ - Date
May 20, 2026
🕒 - Duration
3 to 6 months
-
🏝️ - Location
Hybrid
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
United States
-
🧠 - Skills detailed
#Security #AI (Artificial Intelligence) #Deployment #Data Pipeline #dbt (data build tool) #ML (Machine Learning) #API (Application Programming Interface) #NLP (Natural Language Processing) #Documentation #Cybersecurity #"ETL (Extract #Transform #Load)" #Data Design
Role description
About Kuddo Kuddo is the quality measurement layer for behavioral healthcare. Our AI surfaces whether clinicians are delivering evidence-based protocols (CBT, DBT, MI, FBT, and others) with fidelity, identifies where they're drifting, and closes the gap with targeted feedback. Think of it as FICO for behavioral health quality - the credibility layer the field has been missing. Why we built Kuddo Our mission is to give people the toolkit for a life well-lived. We started in consumer mental health and shifted to the clinicians and supervisors who deliver behavioral health care, because supporting the people in the room is the highest-leverage way to improve patient outcomes at scale. Two beliefs shape how we build: • Therapeutic alliance - the trust between clinician and patient - is the foundation of clinical efficacy in behavioral health. AI's job is to augment and enrich that alliance, not bypass it. Every Kuddo workflow keeps a human in the loop. We surface objective evidence; clinicians make the judgment calls. • Clinicians are not the problem; they're under-supported. Our trainee experience is built to reduce burnout and isolation, not to grade or police. Our supervisor-view gives humans better information, not a replacement for their judgment. If you believe AI should make experts better rather than replace them, you'll feel at home here. The Role We're hiring our first ML engineer to lead the modeling work with us and Stanford Medicine researchers. You'll own the ML end-to-end, translating clinical fidelity rubrics into a well-posed computational problem, building and curating the labeled data pipeline with clinical reviewers, designing rigorous evaluation against expert raters, fine-tuning open-weight models as the dataset grows, and shipping the pipeline into an on-prem research environment. You'll work directly with our CTO (applied ML @ Harvard/MGH, Cleveland Clinic) and Stanford Medicine researchers. Reports to the CEO. This is a contract role for the first three months, paid out of the research project budget, with full-time conversion after we close our seed round this summer. What You'll Do Translating clinical fidelity into a computational problem • Own the architecture of the fidelity scoring engine - how a clinical rubric becomes a well-posed extraction-and-scoring task • Reason about model behavior: what the system is supposed to learn, where it will fail, what evidence will tell you it's working Data and ground truth • Curate and pre-process expert-scored therapy session data; design labeling workflows with clinical reviewers • Handle messy real-world inputs - variance between clean Zoom recordings and noisier archival audio across project phases • Build the data infrastructure that makes the model and the eval reproducible; contribute to the data onboarding process and ETL/ELT processes Evaluation against expert clinicians • Develop and document test plans to determine model behavior against expert clinicians • Design and run the eval - kappa, percent agreement, calibration, failure-mode analysis on disagreement cases • Use disagreements as diagnostic signal: what is the model getting wrong, and why, and how does that change the next training run; create robust documentation Fine-tuning and model iteration • Fine-tune open-weight models as expert-scored sessions accumulate (LoRA, instruction tuning, full fine-tuning - your call on what fits the data and the constraint) • Develop test plans to document assumptions, experiment parameters, results, and ambiguities, as well as recommendations for changes and/or improvements • Run experiments, track results, decide what ships; create robust documentation On-prem environment • Ship the pipeline into an air-gapped research cluster with either rationed GPU compute or bare metal infrastructure and no internet or no public-facing networks; design for those constraints from the start • Partner with our fractional CTO on cybersecurity and IT security decisions • Develop test plans to identify the appropriate architecture; define recommended models and architectures appropriate to the hardware and the task Working with the team • Work directly with Stanford Medicine researchers on study design and technical questions; successfully communicate and serve as an intermediary between Stanford and Kuddo teams • Partner with our CTO on architecture decisions Who You Are • You can translate a real-world question into a well-posed computational problem and have shipped one before. You think clearly about what the model is supposed to learn, what data it needs, and how you'll know it works. • Real ML/DL chops: training and fine-tuning models (LoRA, instruction tuning, full fine-tuning), reasoning about when each is appropriate, and handling everything around the model - data labeling, pre-processing, quality control, drift, calibration, failure-mode analysis. • Strong critical thinking and evaluation methodology. You've built systems where ground truth was contested or expensive, and you can talk fluently about inter-rater agreement as a ceiling. • Familiar with open-weight LLMs (Llama, Qwen, Mistral, or similar) and the tradeoffs vs. closed-API models. • You ship and iterate, not just prototype. Comfortable in early-stage chaos. • You communicate well (both written and orally) with clinicians and researchers as well as engineers. The highest-leverage moments in this role happen at that interface. Years of experience: we don't have a minimum or a maximum. A sharp ML researcher two years out of a strong program is as welcome as a ten-year senior MLE. We care about what you've shipped, what you've learned doing it, and how you reason about ML problems. Nice to Have • Clinical NLP, healthcare AI, or other regulated-environment ML experience • On-prem, air-gapped, or edge AI deployment experience • Speaker diarization or audio pipeline work • Experience working with academic medical research teams Compensation $60/hr contract rate for the first three months (paid out of the research project budget) + Kuddo equity. Full-time offer with cash + equity after we close our seed round this summer. Remote OK; New York or SF Bay Area preferred for occasional in-person. Apply Email wenyi@kuddo.club with a sentence on why this work and a link to the most relevant thing you've shipped.