

ShineBask Technologies LLC
AI Prompt Engineer/ ML Engineer_ Full Time
β - Featured Role | Apply direct with Data Freelance Hub
This role is for an AI Prompt Engineer/ML Engineer in San Francisco, CA, with a contract length of over 6 months, offering a pay rate of "unknown." Candidates must have 2+ years in prompt engineering, strong analytical skills, and must work on-site 5 days a week.
π - Country
United States
π± - Currency
$ USD
-
π° - Day rate
Unknown
-
ποΈ - Date
May 22, 2026
π - Duration
More than 6 months
-
ποΈ - Location
On-site
-
π - Contract
Fixed Term
-
π - Security
Unknown
-
π - Location detailed
San Francisco, CA
-
π§ - Skills detailed
#Data Pipeline #Deployment #"ETL (Extract #Transform #Load)" #AI (Artificial Intelligence) #React #Classification #ML (Machine Learning) #Computer Science #Automation #NLP (Natural Language Processing) #Automatic Speech Recognition (ASR) #TypeScript #Python #SaaS (Software as a Service)
Role description
Full Time Role & On-Site Interview: AI Prompt Engineer/ ML Engineer \_ San Francisco, CA
Openings: 4 Engineers
Face to Face Interview
Locations: San Francisco (222 Columbus Ave, San Francisco, CA 94133)
In-office min 5 days a week
Relocation assistance: Yes
TECH STACK: Python, TypeScript, React
ACCEPTABLE TECH: Python, AI, ML
IMPORTANT: Must Answer Pre-screen Questions for Submission
Have you worked on AI system used by 1,000+ people? - Has the candidate experienced the pain and urgency of understanding and adapting prompts in response to real user feedback
Projects in prompt/context engineering, data curation, evals in production
Vibe coding experience - Public URL?
Must Have:
β’ 2+ years in prompt engineering, data curation, evals in production
β’ Strong analytical and problem-solving mindset; comfort with ambiguity
Must NOT Have:
Heavy ML focus, NOT prompt engineering experience
Strongly Preferred (Positives):
If non-native speaker - Experience as translator or linguistics background and American cultural norms (e.g., date formats: month/day/year)
Can Vibe code - Python chops beyond reading: APIs, data pipelines, testing frameworks
Prior work with voice AI, TTS, ASR, or telephony platforms (Twilio, etc.)
Contact center, SaaS, or customer-facing tech background
Healthcare or medical operations experience β you know what an NPI is, you've worked a front desk, you understand the weird chaos of dental scheduling
Automated prompt optimization experience (DSPy, GEPA, MIPROv2)
Fine-tuning experience
Bachelor's degree and/or extensive experience in one or more of: Computer Science, Engineering, Math, Philosophy, Linguistics, Cognitive Science, English, Medicine, or a related field
Unlikely to Hire (Negatives)
Foreign English speakers lack the required American English dialect familiarity and sufficient US based work experience
INSIDER SCOOP
2026-05-18
Expect 60+ hours/week for now
SF Office: 222 Columbus Ave, San Francisco, CA 94133
In office in SF 5 days/week
Only US Citizens or Green Card holders
Tech Skills: Prompt or Context Engineering, Data Curation, Evals (evaluation)
Prompt engineering with ownership of prompt quality - impact and outcome
Customer-facing AI interaction responsibility preferred
If non- native speaker, Experience as translator or linguistics background helpful
Strong English language skills required
JOB DESCRIPTION
We're hiring an AI Prompt & Agent Developer to own behavioral slice(s) of our voice agents. That behavior splits into two categories: behavior shared across every deployment, and behavior specific to a subset of deployments. You're someone who enjoys looking at the data because the data informs everything else. You'll write prompts, design subagent architectures, build evals, and push automation rates up one small, measurable win at a time.
Responsibilities
β’ Write and maintain the prompts that run in production. Intent classification, information extraction, availability negotiation, closing phrases, insurance verification flows, objection handling, edge-case recovery. You own behavior that touches every customer call.
β’ Ship iteratively against real call data. Every morning, you'll listen to failed calls from yesterday. Every afternoon, you'll deploy a fix. Youβll be using and helping to develop dashboards, call review tooling, and automated agents to accelerate the work.
β’ Build evaluation harnesses. You'll develop offline eval sets, run automated prompt optimization (we use GEPA-style approaches), and establish the test suites that let us ship changes without breaking live deployments.
β’ Human-in-the-loop onboarding. New customers come online constantly. You'll work with and iterate on our internal AI agents that translate a practice's intake form, their scheduling rules, and their quirks into an agent configuration. Every week, you'll be designing new evaluation metrics for these customers and helping to improve existing ones.
β’ QA and continuous improvement. You'll simulate real-world customer scenarios, measure outcomes, and monitor production agent performance so you can catch drift early and fix it fast.
What we're looking for
β’ You've shipped prompts that broke production. Doesn't matter if it was at OpenAI, a chatbot startup, a research lab, or your own project. What matters is that you've felt the specific pain of a prompt that worked beautifully in dev and broke the second it hit real users.
β’ You're meticulous and careful. Looking at data for long stretches energizes you, as long as there's a signal. You stay organized when five things are in flight. We deploy multiple times a day, and we also run healthcare workflows where a bad change costs real money for real practices. You know the difference between moving fast and breaking things.
β’ Writing sensibility. The best prompt engineers are good writers. You notice register, rhythm, and word choice. You can tell why "Hello, cornerside dental? This is Ava, how can I help you out today? sounds warmer than "Hello, Cornerside Dental, this is Ava. How can I help you out today" out of a TTS.
β’ Analytical and empirical. You are relentlessly data-driven. Before you make changes, you proactively run experiments and measure. You don't ship because "I think this is better." You justify a change with "this moved booking rate from 78.2% to 81.4% on n=412 calls."
β’ Comfort with code. You don't need to be a senior engineer, but you should read Python fluently and TypeScript comfortably, and you can get almost any coding task done by pairing with modern AI coding tools.
Requirements
β’ 2+ years of experience with AI/ML, NLP, or prompt engineering in production
β’ Strong analytical and problem-solving mindset; comfort with ambiguity
β’ Excellent written and verbal communication skills
β’ Bachelor's degree and/or extensive experience in one or more of: Computer Science, Engineering, Math, Philosophy, Linguistics, Cognitive Science, English, Medicine, or a related field
β’ Preferred Qualifications
β’ Python chops beyond reading: APIs, data pipelines, testing frameworks
β’ Prior work with voice AI, TTS, ASR, or telephony platforms (Twilio, etc.)
β’ Contact center, SaaS, or customer-facing tech background
β’ Healthcare or medical operations experience β you know what an NPI is, you've worked a front desk, you understand the weird chaos of dental scheduling
β’ Automated prompt optimization experience (DSPy, GEPA, MIPROv2)
β’ Fine-tuning experience
INTERVIEW PROCESS:
General:
Stage 1: 30min coding (medium/hard easy) with founder - 30% pass rate
Common failures: canβt think through algorithm before coding, poor communicator, doesn't describe their plan. doesn't have any questions regarding the business
Stage 2: 45min deeper coding with Founder/CTO - 50% pass rate
Success factor: strong communication throughout
Final: 2.5hr onsite mini project with Founder/CTO
End-to-end system design, any tools allowed.
Full Time Role & On-Site Interview: AI Prompt Engineer/ ML Engineer \_ San Francisco, CA
Openings: 4 Engineers
Face to Face Interview
Locations: San Francisco (222 Columbus Ave, San Francisco, CA 94133)
In-office min 5 days a week
Relocation assistance: Yes
TECH STACK: Python, TypeScript, React
ACCEPTABLE TECH: Python, AI, ML
IMPORTANT: Must Answer Pre-screen Questions for Submission
Have you worked on AI system used by 1,000+ people? - Has the candidate experienced the pain and urgency of understanding and adapting prompts in response to real user feedback
Projects in prompt/context engineering, data curation, evals in production
Vibe coding experience - Public URL?
Must Have:
β’ 2+ years in prompt engineering, data curation, evals in production
β’ Strong analytical and problem-solving mindset; comfort with ambiguity
Must NOT Have:
Heavy ML focus, NOT prompt engineering experience
Strongly Preferred (Positives):
If non-native speaker - Experience as translator or linguistics background and American cultural norms (e.g., date formats: month/day/year)
Can Vibe code - Python chops beyond reading: APIs, data pipelines, testing frameworks
Prior work with voice AI, TTS, ASR, or telephony platforms (Twilio, etc.)
Contact center, SaaS, or customer-facing tech background
Healthcare or medical operations experience β you know what an NPI is, you've worked a front desk, you understand the weird chaos of dental scheduling
Automated prompt optimization experience (DSPy, GEPA, MIPROv2)
Fine-tuning experience
Bachelor's degree and/or extensive experience in one or more of: Computer Science, Engineering, Math, Philosophy, Linguistics, Cognitive Science, English, Medicine, or a related field
Unlikely to Hire (Negatives)
Foreign English speakers lack the required American English dialect familiarity and sufficient US based work experience
INSIDER SCOOP
2026-05-18
Expect 60+ hours/week for now
SF Office: 222 Columbus Ave, San Francisco, CA 94133
In office in SF 5 days/week
Only US Citizens or Green Card holders
Tech Skills: Prompt or Context Engineering, Data Curation, Evals (evaluation)
Prompt engineering with ownership of prompt quality - impact and outcome
Customer-facing AI interaction responsibility preferred
If non- native speaker, Experience as translator or linguistics background helpful
Strong English language skills required
JOB DESCRIPTION
We're hiring an AI Prompt & Agent Developer to own behavioral slice(s) of our voice agents. That behavior splits into two categories: behavior shared across every deployment, and behavior specific to a subset of deployments. You're someone who enjoys looking at the data because the data informs everything else. You'll write prompts, design subagent architectures, build evals, and push automation rates up one small, measurable win at a time.
Responsibilities
β’ Write and maintain the prompts that run in production. Intent classification, information extraction, availability negotiation, closing phrases, insurance verification flows, objection handling, edge-case recovery. You own behavior that touches every customer call.
β’ Ship iteratively against real call data. Every morning, you'll listen to failed calls from yesterday. Every afternoon, you'll deploy a fix. Youβll be using and helping to develop dashboards, call review tooling, and automated agents to accelerate the work.
β’ Build evaluation harnesses. You'll develop offline eval sets, run automated prompt optimization (we use GEPA-style approaches), and establish the test suites that let us ship changes without breaking live deployments.
β’ Human-in-the-loop onboarding. New customers come online constantly. You'll work with and iterate on our internal AI agents that translate a practice's intake form, their scheduling rules, and their quirks into an agent configuration. Every week, you'll be designing new evaluation metrics for these customers and helping to improve existing ones.
β’ QA and continuous improvement. You'll simulate real-world customer scenarios, measure outcomes, and monitor production agent performance so you can catch drift early and fix it fast.
What we're looking for
β’ You've shipped prompts that broke production. Doesn't matter if it was at OpenAI, a chatbot startup, a research lab, or your own project. What matters is that you've felt the specific pain of a prompt that worked beautifully in dev and broke the second it hit real users.
β’ You're meticulous and careful. Looking at data for long stretches energizes you, as long as there's a signal. You stay organized when five things are in flight. We deploy multiple times a day, and we also run healthcare workflows where a bad change costs real money for real practices. You know the difference between moving fast and breaking things.
β’ Writing sensibility. The best prompt engineers are good writers. You notice register, rhythm, and word choice. You can tell why "Hello, cornerside dental? This is Ava, how can I help you out today? sounds warmer than "Hello, Cornerside Dental, this is Ava. How can I help you out today" out of a TTS.
β’ Analytical and empirical. You are relentlessly data-driven. Before you make changes, you proactively run experiments and measure. You don't ship because "I think this is better." You justify a change with "this moved booking rate from 78.2% to 81.4% on n=412 calls."
β’ Comfort with code. You don't need to be a senior engineer, but you should read Python fluently and TypeScript comfortably, and you can get almost any coding task done by pairing with modern AI coding tools.
Requirements
β’ 2+ years of experience with AI/ML, NLP, or prompt engineering in production
β’ Strong analytical and problem-solving mindset; comfort with ambiguity
β’ Excellent written and verbal communication skills
β’ Bachelor's degree and/or extensive experience in one or more of: Computer Science, Engineering, Math, Philosophy, Linguistics, Cognitive Science, English, Medicine, or a related field
β’ Preferred Qualifications
β’ Python chops beyond reading: APIs, data pipelines, testing frameworks
β’ Prior work with voice AI, TTS, ASR, or telephony platforms (Twilio, etc.)
β’ Contact center, SaaS, or customer-facing tech background
β’ Healthcare or medical operations experience β you know what an NPI is, you've worked a front desk, you understand the weird chaos of dental scheduling
β’ Automated prompt optimization experience (DSPy, GEPA, MIPROv2)
β’ Fine-tuning experience
INTERVIEW PROCESS:
General:
Stage 1: 30min coding (medium/hard easy) with founder - 30% pass rate
Common failures: canβt think through algorithm before coding, poor communicator, doesn't describe their plan. doesn't have any questions regarding the business
Stage 2: 45min deeper coding with Founder/CTO - 50% pass rate
Success factor: strong communication throughout
Final: 2.5hr onsite mini project with Founder/CTO
End-to-end system design, any tools allowed.






