

Intelliswift Software
Senior AI Data Engineer
⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Senior AI Data Engineer in Menlo Park, CA, for 6 months with potential extensions. Pay rate is unspecified. Requires advanced SQL, data pipeline expertise, ML model integration, and 5+ years of relevant experience. Bachelor's degree in a related field is mandatory.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
640
-
🗓️ - Date
June 3, 2026
🕒 - Duration
More than 6 months
-
🏝️ - Location
Hybrid
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
Menlo Park, CA
-
🧠 - Skills detailed
#Object Detection #Indexing #SQL (Structured Query Language) #Data Transformations #"ETL (Extract #Transform #Load)" #Airflow #Debugging #Data Pipeline #Scala #Data Quality #Computer Science #Data Cleaning #Datasets #Complex Queries #ML (Machine Learning) #Data Engineering #Data Enrichment #Batch #AI (Artificial Intelligence)
Role description
Job Title: Senior AI Data Engineer
Location: Menlo Park, CA (Hybrid)
Duration: 6 months (potential extensions to long term)
Our client is looking for a Senior AI Data Engineer to build and scale next-generation data pipelines powering image generation systems. This role sits at the intersection of data engineering and ML systems, where pipelines not only process data but also invoke and orchestrate machine learning models at scale.
You’ll work on large-scale datasets (billions of records/images), enabling high-quality training data across dimensions like visual quality, prompt adherence, and content understanding.
Must-Have Skills
• Advanced SQL & data pipeline expertise. Complex queries, query optimization, pipeline orchestration frameworks (Airflow or equivalent).
• Experience integrating ML models into data pipelines. Calling inference endpoints, managing model versions, batching requests, handling inference failures at scale.
• Demonstrated track record of building and operating production data pipelines that invoke ML models at scale.
• Proficiency with AI-assisted coding agents (e.g., Copilot, Cursor, Codex). Expected to leverage AI tools as a force multiplier for writing, debugging, and reviewing code, building pipelines faster, and accelerating day-to-day engineering workflows
• Strong verbal and written communication skills, problem-solving ability, and cross-functional collaboration
Nice-to-have Skills:
• Working knowledge of embeddings and vector representations like generating, storing, indexing, and querying embeddings.
• Familiarity with content-understanding models like image classifiers, object detection, OCR, NSFW detection, aesthetic scoring.
• Experience with LLMs for data tasks like prompt engineering for annotation, data cleaning, or evaluation using LLM APIs.
• Knowledge of generative AI like diffusion models, image generation, evaluation metrics (FID, CLIP score, etc.).
Education / Experience
• Bachelor's degree or higher in Computer Science, Data Engineering, Machine Learning, or a related STEM field.
• 5+ years of industry experience in data engineering, ML engineering, or a hybrid role involving both data pipelines and model serving/inference.
• Demonstrated track record of building and operating production data pipelines that invoke ML models at scale.
Key Responsibilities
AI-Augmented Data Pipelines:
• Design and maintain large-scale pipelines that combine data transformations with ML model inference
• Integrate classifiers, embeddings, and LLM-based processing into data workflows
Inference Orchestration:
• Manage remote model execution within pipelines, including batching, retries, and async processing
• Optimize performance, scalability, and reliability of inference systems
Embedding & Feature Engineering:
• Build and maintain pipelines for generating and managing vector embeddings
• Support similarity search and indexing use cases
Data Curation at Scale:
• Source, clean, and curate datasets using a combination of SQL logic and model-derived signals
• Ensure data quality, governance, and consistency
LLM-Based Workflows:
• Develop pipelines using LLMs for annotation, evaluation, and data enrichment
• Implement quality checks and audit mechanisms for model-driven outputs
Tooling & Frameworks:
• Contribute to reusable tools and frameworks that simplify AI-powered data pipeline development
Job Title: Senior AI Data Engineer
Location: Menlo Park, CA (Hybrid)
Duration: 6 months (potential extensions to long term)
Our client is looking for a Senior AI Data Engineer to build and scale next-generation data pipelines powering image generation systems. This role sits at the intersection of data engineering and ML systems, where pipelines not only process data but also invoke and orchestrate machine learning models at scale.
You’ll work on large-scale datasets (billions of records/images), enabling high-quality training data across dimensions like visual quality, prompt adherence, and content understanding.
Must-Have Skills
• Advanced SQL & data pipeline expertise. Complex queries, query optimization, pipeline orchestration frameworks (Airflow or equivalent).
• Experience integrating ML models into data pipelines. Calling inference endpoints, managing model versions, batching requests, handling inference failures at scale.
• Demonstrated track record of building and operating production data pipelines that invoke ML models at scale.
• Proficiency with AI-assisted coding agents (e.g., Copilot, Cursor, Codex). Expected to leverage AI tools as a force multiplier for writing, debugging, and reviewing code, building pipelines faster, and accelerating day-to-day engineering workflows
• Strong verbal and written communication skills, problem-solving ability, and cross-functional collaboration
Nice-to-have Skills:
• Working knowledge of embeddings and vector representations like generating, storing, indexing, and querying embeddings.
• Familiarity with content-understanding models like image classifiers, object detection, OCR, NSFW detection, aesthetic scoring.
• Experience with LLMs for data tasks like prompt engineering for annotation, data cleaning, or evaluation using LLM APIs.
• Knowledge of generative AI like diffusion models, image generation, evaluation metrics (FID, CLIP score, etc.).
Education / Experience
• Bachelor's degree or higher in Computer Science, Data Engineering, Machine Learning, or a related STEM field.
• 5+ years of industry experience in data engineering, ML engineering, or a hybrid role involving both data pipelines and model serving/inference.
• Demonstrated track record of building and operating production data pipelines that invoke ML models at scale.
Key Responsibilities
AI-Augmented Data Pipelines:
• Design and maintain large-scale pipelines that combine data transformations with ML model inference
• Integrate classifiers, embeddings, and LLM-based processing into data workflows
Inference Orchestration:
• Manage remote model execution within pipelines, including batching, retries, and async processing
• Optimize performance, scalability, and reliability of inference systems
Embedding & Feature Engineering:
• Build and maintain pipelines for generating and managing vector embeddings
• Support similarity search and indexing use cases
Data Curation at Scale:
• Source, clean, and curate datasets using a combination of SQL logic and model-derived signals
• Ensure data quality, governance, and consistency
LLM-Based Workflows:
• Develop pipelines using LLMs for annotation, evaluation, and data enrichment
• Implement quality checks and audit mechanisms for model-driven outputs
Tooling & Frameworks:
• Contribute to reusable tools and frameworks that simplify AI-powered data pipeline development






