ShineBask Technologies LLC

AI Prompt Engineer/ ML Engineer_ Full Time

⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for an AI Prompt Engineer/ML Engineer in San Francisco, CA, with a contract length of over 6 months, offering a pay rate of "unknown." Candidates must have 2+ years in prompt engineering, strong analytical skills, and must work on-site 5 days a week.
🌎 - Country
United States
πŸ’± - Currency
$ USD
-
πŸ’° - Day rate
Unknown
-
πŸ—“οΈ - Date
May 22, 2026
πŸ•’ - Duration
More than 6 months
-
🏝️ - Location
On-site
-
πŸ“„ - Contract
Fixed Term
-
πŸ”’ - Security
Unknown
-
πŸ“ - Location detailed
San Francisco, CA
-
🧠 - Skills detailed
#Data Pipeline #Deployment #"ETL (Extract #Transform #Load)" #AI (Artificial Intelligence) #React #Classification #ML (Machine Learning) #Computer Science #Automation #NLP (Natural Language Processing) #Automatic Speech Recognition (ASR) #TypeScript #Python #SaaS (Software as a Service)
Role description
Full Time Role & On-Site Interview: AI Prompt Engineer/ ML Engineer \_ San Francisco, CA Openings: 4 Engineers Face to Face Interview Locations: San Francisco (222 Columbus Ave, San Francisco, CA 94133) In-office min 5 days a week Relocation assistance: Yes TECH STACK: Python, TypeScript, React ACCEPTABLE TECH: Python, AI, ML IMPORTANT: Must Answer Pre-screen Questions for Submission Have you worked on AI system used by 1,000+ people? - Has the candidate experienced the pain and urgency of understanding and adapting prompts in response to real user feedback Projects in prompt/context engineering, data curation, evals in production Vibe coding experience - Public URL? Must Have: β€’ 2+ years in prompt engineering, data curation, evals in production β€’ Strong analytical and problem-solving mindset; comfort with ambiguity Must NOT Have: Heavy ML focus, NOT prompt engineering experience Strongly Preferred (Positives): If non-native speaker - Experience as translator or linguistics background and American cultural norms (e.g., date formats: month/day/year) Can Vibe code - Python chops beyond reading: APIs, data pipelines, testing frameworks Prior work with voice AI, TTS, ASR, or telephony platforms (Twilio, etc.) Contact center, SaaS, or customer-facing tech background Healthcare or medical operations experience β€” you know what an NPI is, you've worked a front desk, you understand the weird chaos of dental scheduling Automated prompt optimization experience (DSPy, GEPA, MIPROv2) Fine-tuning experience Bachelor's degree and/or extensive experience in one or more of: Computer Science, Engineering, Math, Philosophy, Linguistics, Cognitive Science, English, Medicine, or a related field Unlikely to Hire (Negatives) Foreign English speakers lack the required American English dialect familiarity and sufficient US based work experience INSIDER SCOOP 2026-05-18 Expect 60+ hours/week for now SF Office: 222 Columbus Ave, San Francisco, CA 94133 In office in SF 5 days/week Only US Citizens or Green Card holders Tech Skills: Prompt or Context Engineering, Data Curation, Evals (evaluation) Prompt engineering with ownership of prompt quality - impact and outcome Customer-facing AI interaction responsibility preferred If non- native speaker, Experience as translator or linguistics background helpful Strong English language skills required JOB DESCRIPTION We're hiring an AI Prompt & Agent Developer to own behavioral slice(s) of our voice agents. That behavior splits into two categories: behavior shared across every deployment, and behavior specific to a subset of deployments. You're someone who enjoys looking at the data because the data informs everything else. You'll write prompts, design subagent architectures, build evals, and push automation rates up one small, measurable win at a time. Responsibilities β€’ Write and maintain the prompts that run in production. Intent classification, information extraction, availability negotiation, closing phrases, insurance verification flows, objection handling, edge-case recovery. You own behavior that touches every customer call. β€’ Ship iteratively against real call data. Every morning, you'll listen to failed calls from yesterday. Every afternoon, you'll deploy a fix. You’ll be using and helping to develop dashboards, call review tooling, and automated agents to accelerate the work. β€’ Build evaluation harnesses. You'll develop offline eval sets, run automated prompt optimization (we use GEPA-style approaches), and establish the test suites that let us ship changes without breaking live deployments. β€’ Human-in-the-loop onboarding. New customers come online constantly. You'll work with and iterate on our internal AI agents that translate a practice's intake form, their scheduling rules, and their quirks into an agent configuration. Every week, you'll be designing new evaluation metrics for these customers and helping to improve existing ones. β€’ QA and continuous improvement. You'll simulate real-world customer scenarios, measure outcomes, and monitor production agent performance so you can catch drift early and fix it fast. What we're looking for β€’ You've shipped prompts that broke production. Doesn't matter if it was at OpenAI, a chatbot startup, a research lab, or your own project. What matters is that you've felt the specific pain of a prompt that worked beautifully in dev and broke the second it hit real users. β€’ You're meticulous and careful. Looking at data for long stretches energizes you, as long as there's a signal. You stay organized when five things are in flight. We deploy multiple times a day, and we also run healthcare workflows where a bad change costs real money for real practices. You know the difference between moving fast and breaking things. β€’ Writing sensibility. The best prompt engineers are good writers. You notice register, rhythm, and word choice. You can tell why "Hello, cornerside dental? This is Ava, how can I help you out today? sounds warmer than "Hello, Cornerside Dental, this is Ava. How can I help you out today" out of a TTS. β€’ Analytical and empirical. You are relentlessly data-driven. Before you make changes, you proactively run experiments and measure. You don't ship because "I think this is better." You justify a change with "this moved booking rate from 78.2% to 81.4% on n=412 calls." β€’ Comfort with code. You don't need to be a senior engineer, but you should read Python fluently and TypeScript comfortably, and you can get almost any coding task done by pairing with modern AI coding tools. Requirements β€’ 2+ years of experience with AI/ML, NLP, or prompt engineering in production β€’ Strong analytical and problem-solving mindset; comfort with ambiguity β€’ Excellent written and verbal communication skills β€’ Bachelor's degree and/or extensive experience in one or more of: Computer Science, Engineering, Math, Philosophy, Linguistics, Cognitive Science, English, Medicine, or a related field β€’ Preferred Qualifications β€’ Python chops beyond reading: APIs, data pipelines, testing frameworks β€’ Prior work with voice AI, TTS, ASR, or telephony platforms (Twilio, etc.) β€’ Contact center, SaaS, or customer-facing tech background β€’ Healthcare or medical operations experience β€” you know what an NPI is, you've worked a front desk, you understand the weird chaos of dental scheduling β€’ Automated prompt optimization experience (DSPy, GEPA, MIPROv2) β€’ Fine-tuning experience INTERVIEW PROCESS: General: Stage 1: 30min coding (medium/hard easy) with founder - 30% pass rate Common failures: can’t think through algorithm before coding, poor communicator, doesn't describe their plan. doesn't have any questions regarding the business Stage 2: 45min deeper coding with Founder/CTO - 50% pass rate Success factor: strong communication throughout Final: 2.5hr onsite mini project with Founder/CTO End-to-end system design, any tools allowed.