

Derma Made
AI / Machine Learning Engineer
⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for an AI/Machine Learning Engineer, contracted for "contract length" at a pay rate of "pay rate." The position requires expertise in Python, applied ML, speech/vision processing, and experience with ASR and LLM fine-tuning. Remote work location.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
Unknown
-
🗓️ - Date
October 29, 2025
🕒 - Duration
Unknown
-
🏝️ - Location
Unknown
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
United States
-
🧠 - Skills detailed
#Data Pipeline #PyTorch #OpenCV (Open Source Computer Vision Library) #Hugging Face #Automatic Speech Recognition (ASR) #Transformers #GCP (Google Cloud Platform) #API (Application Programming Interface) #Azure #ML (Machine Learning) #Datasets #Python #Version Control #Langchain #Docker #"ETL (Extract #Transform #Load)" #AI (Artificial Intelligence) #AWS (Amazon Web Services) #Model Evaluation
Role description
Build the foundation of an AI Sales Coaching Platform that analyzes real sales calls (Zoom and phone), detects performance cues (speech, tone, emotion, engagement), and delivers live and sales coaching using fine-tuned large language models.
You’ll turn raw Zoom recordings into structured multimodal data — transcripts, acoustic, and visual cues — and help train a private model that understands what great selling sounds and looks like.
Key Responsibilities
• Design and implement end-to-end audio/video data pipelines:
• Transcription (WhisperX)
• Diarization (pyannote / SpeechBrain)
• Acoustic feature extraction (librosa / OpenSMILE)
• Visual feature extraction (MediaPipe / DeepFace / YOLOv8)
• Develop segmentation and labeling tools (Label Studio or custom interface).
• Fine-tune and evaluate LLM models (OpenAI, Hugging Face, LoRA adapters).
• Build and maintain training datasets (JSONL format with multimodal features).
• Work with sales domain experts to encode critique → improvement → ideal phrasing logic into model prompts or fine-tuning sets.
• Prototype real-time inference pipelines for live coaching:
• Streaming ASR + feature extraction
• Latency optimization (<2s E2E)
• Collaborate on simple web or desktop demo UI for feedback playback.
• Prepare model evaluation metrics and dashboards.
Tech Stack You’ll Use
• Python, PyTorch, Hugging Face Transformers
• WhisperX, pyannote.audio, librosa, OpenSMILE, SpeechBrain
• MediaPipe, DeepFace, YOLOv8, OpenCV
• OpenAI API / fine-tuning, LangChain / LangGraph
• AWS / GCP / Azure for GPU compute
• Label Studio, Weights & Biases, Docker
Ideal Background
• 3–5+ years in applied ML or speech/vision processing.
• Experience with speech diarization, ASR, or emotion recognition.
• Familiar with prompt engineering, RAG, LLM fine-tuning and RLHF workflows.
• Comfortable handling large unstructured datasets (audio/video).
• Strong software engineering habits (version control, reproducible pipelines).
Build the foundation of an AI Sales Coaching Platform that analyzes real sales calls (Zoom and phone), detects performance cues (speech, tone, emotion, engagement), and delivers live and sales coaching using fine-tuned large language models.
You’ll turn raw Zoom recordings into structured multimodal data — transcripts, acoustic, and visual cues — and help train a private model that understands what great selling sounds and looks like.
Key Responsibilities
• Design and implement end-to-end audio/video data pipelines:
• Transcription (WhisperX)
• Diarization (pyannote / SpeechBrain)
• Acoustic feature extraction (librosa / OpenSMILE)
• Visual feature extraction (MediaPipe / DeepFace / YOLOv8)
• Develop segmentation and labeling tools (Label Studio or custom interface).
• Fine-tune and evaluate LLM models (OpenAI, Hugging Face, LoRA adapters).
• Build and maintain training datasets (JSONL format with multimodal features).
• Work with sales domain experts to encode critique → improvement → ideal phrasing logic into model prompts or fine-tuning sets.
• Prototype real-time inference pipelines for live coaching:
• Streaming ASR + feature extraction
• Latency optimization (<2s E2E)
• Collaborate on simple web or desktop demo UI for feedback playback.
• Prepare model evaluation metrics and dashboards.
Tech Stack You’ll Use
• Python, PyTorch, Hugging Face Transformers
• WhisperX, pyannote.audio, librosa, OpenSMILE, SpeechBrain
• MediaPipe, DeepFace, YOLOv8, OpenCV
• OpenAI API / fine-tuning, LangChain / LangGraph
• AWS / GCP / Azure for GPU compute
• Label Studio, Weights & Biases, Docker
Ideal Background
• 3–5+ years in applied ML or speech/vision processing.
• Experience with speech diarization, ASR, or emotion recognition.
• Familiar with prompt engineering, RAG, LLM fine-tuning and RLHF workflows.
• Comfortable handling large unstructured datasets (audio/video).
• Strong software engineering habits (version control, reproducible pipelines).






