

Intelliswift Software
Senior AI Data Engineer
⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Senior AI Data Engineer in Menlo Park, CA, for 7 months at a competitive pay rate. Key skills include advanced SQL, ML integration, and experience with large-scale pipelines. Familiarity with embeddings and generative AI is preferred.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
720
-
🗓️ - Date
May 2, 2026
🕒 - Duration
More than 6 months
-
🏝️ - Location
On-site
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
Menlo Park, CA
-
🧠 - Skills detailed
#ML (Machine Learning) #Storage #Code Reviews #Scala #AI (Artificial Intelligence) #Indexing #Data Engineering #Capacity Management #Data Pipeline #Compliance #SQL (Structured Query Language) #Data Quality #Datasets #Data Cleaning #Airflow #Classification #Batch #"ETL (Extract #Transform #Load)" #Debugging #Data Lifecycle
Role description
Job Title: Senior AI Data Engineer (Contract)
Location: Menlo Park, CA
Duration: 7 months (with potential for extension)
As a Senior AI Data Engineer, you will design and operate end‑to‑end pipelines that not only move and transform data, but enrich it using ML models such as classifiers, embedding models, and large language models. The role sits at the intersection of data engineering and ML systems, requiring strong systems thinking around throughput, retries, async execution, and capacity management.
You will work closely with engineers and researchers to support image generation and evaluation workflows, contributing directly to data quality, model performance, and scalability.
Required Skills & Experience
• Strong data engineering expertise, including advanced SQL, complex query optimization, and production pipeline orchestration (e.g., Airflow or equivalent)
Hands‑on experience integrating ML inference into data pipelines, including:
• Calling inference endpoints
• Managing batching and throughput
• Handling failures and retries at scale
• Experience operating large-scale production pipelines with high reliability and performance requirements.
• Proficiency using AI‑assisted coding tools to accelerate development, debugging, and code reviews.
• Strong communication skills and ability to collaborate with engineers, researchers, and cross‑functional teams.
Preferred Qualifications
• Experience working with embeddings and vector search, including storage, indexing, and similarity queries.
• Familiarity with content understanding models, such as image classification, OCR, safety or quality scoring.
• Experience using LLMs for data workflows, including automated annotation, data cleaning, or evaluation tasks.
• Knowledge of generative AI systems, particularly image generation and corresponding evaluation metrics.
• Background working in data engineering, ML engineering, or hybrid roles that support model training or evaluation.
Responsibilities
• AI‑Augmented Data Pipelines: Design and maintain large‑scale data pipelines (up to billions of records/images) that combine SQL-based transformations with ML model inference for data cleaning, labeling, and enrichment.
• Remote Inference Orchestration: Build and own systems that orchestrate remote model inference within pipelines, including batching, async execution, retries, fallback logic, and graceful degradation under load.
• Feature & Embedding Pipelines: Develop scalable pipelines to generate, store, validate, and serve vector embeddings. Manage nearest‑neighbor indexes and ensure data quality at scale.
• Data Curation at Scale: Source, filter, and curate training datasets using both structured queries and model‑derived signals (e.g., visual quality scores, content classification, safety filters). Own the end‑to‑end data lifecycle with a focus on quality, governance, and compliance.
• LLM‑Assisted Annotation: Design pipelines that use large language models and vision models for automated data annotation. Create auditing workflows to evaluate and improve annotation quality.
• Shared Tooling & Frameworks: Contribute reusable components and frameworks that simplify AI‑augmented data pipelines, such as standardized model‑invocation operators and async job orchestration patterns.
Job Title: Senior AI Data Engineer (Contract)
Location: Menlo Park, CA
Duration: 7 months (with potential for extension)
As a Senior AI Data Engineer, you will design and operate end‑to‑end pipelines that not only move and transform data, but enrich it using ML models such as classifiers, embedding models, and large language models. The role sits at the intersection of data engineering and ML systems, requiring strong systems thinking around throughput, retries, async execution, and capacity management.
You will work closely with engineers and researchers to support image generation and evaluation workflows, contributing directly to data quality, model performance, and scalability.
Required Skills & Experience
• Strong data engineering expertise, including advanced SQL, complex query optimization, and production pipeline orchestration (e.g., Airflow or equivalent)
Hands‑on experience integrating ML inference into data pipelines, including:
• Calling inference endpoints
• Managing batching and throughput
• Handling failures and retries at scale
• Experience operating large-scale production pipelines with high reliability and performance requirements.
• Proficiency using AI‑assisted coding tools to accelerate development, debugging, and code reviews.
• Strong communication skills and ability to collaborate with engineers, researchers, and cross‑functional teams.
Preferred Qualifications
• Experience working with embeddings and vector search, including storage, indexing, and similarity queries.
• Familiarity with content understanding models, such as image classification, OCR, safety or quality scoring.
• Experience using LLMs for data workflows, including automated annotation, data cleaning, or evaluation tasks.
• Knowledge of generative AI systems, particularly image generation and corresponding evaluation metrics.
• Background working in data engineering, ML engineering, or hybrid roles that support model training or evaluation.
Responsibilities
• AI‑Augmented Data Pipelines: Design and maintain large‑scale data pipelines (up to billions of records/images) that combine SQL-based transformations with ML model inference for data cleaning, labeling, and enrichment.
• Remote Inference Orchestration: Build and own systems that orchestrate remote model inference within pipelines, including batching, async execution, retries, fallback logic, and graceful degradation under load.
• Feature & Embedding Pipelines: Develop scalable pipelines to generate, store, validate, and serve vector embeddings. Manage nearest‑neighbor indexes and ensure data quality at scale.
• Data Curation at Scale: Source, filter, and curate training datasets using both structured queries and model‑derived signals (e.g., visual quality scores, content classification, safety filters). Own the end‑to‑end data lifecycle with a focus on quality, governance, and compliance.
• LLM‑Assisted Annotation: Design pipelines that use large language models and vision models for automated data annotation. Create auditing workflows to evaluate and improve annotation quality.
• Shared Tooling & Frameworks: Contribute reusable components and frameworks that simplify AI‑augmented data pipelines, such as standardized model‑invocation operators and async job orchestration patterns.






