Derma Made

AI / Machine Learning Engineer

⭐ - Featured Role | Apply direct with Data Freelance Hub

This role is for an AI/Machine Learning Engineer, contracted for "contract length" at a pay rate of "pay rate." The position requires expertise in Python, applied ML, speech/vision processing, and experience with ASR and LLM fine-tuning. Remote work location.

🌎 - Country

United States

💱 - Currency

$ USD

💰 - Day rate

Unknown

🗓️ - Date

October 29, 2025

🕒 - Duration

Unknown

🏝️ - Location

Unknown

📄 - Contract

Unknown

🔒 - Security

Unknown

📍 - Location detailed

United States

🧠 - Skills detailed

#Data Pipeline #PyTorch #OpenCV (Open Source Computer Vision Library) #Hugging Face #Automatic Speech Recognition (ASR) #Transformers #GCP (Google Cloud Platform) #API (Application Programming Interface) #Azure #ML (Machine Learning) #Datasets #Python #Version Control #Langchain #Docker #"ETL (Extract #Transform #Load)" #AI (Artificial Intelligence) #AWS (Amazon Web Services) #Model Evaluation

Role description

Build the foundation of an AI Sales Coaching Platform that analyzes real sales calls (Zoom and phone), detects performance cues (speech, tone, emotion, engagement), and delivers live and sales coaching using fine-tuned large language models. You’ll turn raw Zoom recordings into structured multimodal data — transcripts, acoustic, and visual cues — and help train a private model that understands what great selling sounds and looks like. Key Responsibilities • Design and implement end-to-end audio/video data pipelines: • Transcription (WhisperX) • Diarization (pyannote / SpeechBrain) • Acoustic feature extraction (librosa / OpenSMILE) • Visual feature extraction (MediaPipe / DeepFace / YOLOv8) • Develop segmentation and labeling tools (Label Studio or custom interface). • Fine-tune and evaluate LLM models (OpenAI, Hugging Face, LoRA adapters). • Build and maintain training datasets (JSONL format with multimodal features). • Work with sales domain experts to encode critique → improvement → ideal phrasing logic into model prompts or fine-tuning sets. • Prototype real-time inference pipelines for live coaching: • Streaming ASR + feature extraction • Latency optimization (<2s E2E) • Collaborate on simple web or desktop demo UI for feedback playback. • Prepare model evaluation metrics and dashboards. Tech Stack You’ll Use • Python, PyTorch, Hugging Face Transformers • WhisperX, pyannote.audio, librosa, OpenSMILE, SpeechBrain • MediaPipe, DeepFace, YOLOv8, OpenCV • OpenAI API / fine-tuning, LangChain / LangGraph • AWS / GCP / Azure for GPU compute • Label Studio, Weights & Biases, Docker Ideal Background • 3–5+ years in applied ML or speech/vision processing. • Experience with speech diarization, ASR, or emotion recognition. • Familiar with prompt engineering, RAG, LLM fine-tuning and RLHF workflows. • Comfortable handling large unstructured datasets (audio/video). • Strong software engineering habits (version control, reproducible pipelines).

Apply now Apply with DFH Sign up

← See all roles