

RICEFW Technologies Inc
AI Data Engineer
⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Senior AI Data Engineer in Lansing, Michigan, on a hybrid contract. It requires 7+ years in Data Engineering, 3+ years in AI/ML workflows, expertise in Python, SQL, and cloud services, with strong knowledge of data governance and MLOps.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
Unknown
-
🗓️ - Date
November 13, 2025
🕒 - Duration
Unknown
-
🏝️ - Location
Hybrid
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
Lansing, MI
-
🧠 - Skills detailed
#Deployment #Cloud #Compliance #ML (Machine Learning) #SageMaker #Data Lifecycle #Dataflow #Data Engineering #Data Privacy #Data Orchestration #Databases #PySpark #GDPR (General Data Protection Regulation) #Unsupervised Learning #Langchain #Data Quality #Kubernetes #AWS SageMaker #Data Lake #MLflow #Data Ingestion #Airflow #FastAPI #GCP (Google Cloud Platform) #Pandas #Transformers #SQL (Structured Query Language) #Supervised Learning #AWS (Amazon Web Services) #Azure #Azure Machine Learning #"ETL (Extract #Transform #Load)" #Terraform #Python #Datasets #Docker #AI (Artificial Intelligence) #Data Pipeline #GitHub #Security #Monitoring #Data Science #Spark (Apache Spark) #API (Application Programming Interface) #Databricks #Hugging Face #Model Deployment
Role description
AI Data Engineer
Mode of work: Hybrid
Location: Lansing, Michigan
Job Summary:
We are looking for a Senior AI Data Engineer with strong expertise in AI/ML data pipelines, model data preparation, vector databases, and cloud-based AI infrastructure. The candidate will architect, build, and optimize the data ecosystem that powers machine learning, generative AI, and LLM applications. You’ll collaborate closely with Data Scientists, ML Engineers, and Cloud Architects to ensure high-quality, production-ready data for AI models.
Core Responsibilities:
• Design and build end-to-end AI data pipelines for training, fine-tuning, and inference of ML/Gen-AI models.
• Create data ingestion, transformation, and feature-engineering workflows optimized for large-scale model training.
• Develop and manage data lakes, feature stores, and vector databases (e.g., Pinecone, Weaviate, FAISS).
• Collaborate with ML Engineers to operationalize model deployment pipelines (using MLflow, Kubeflow, or SageMaker Pipelines).
• Integrate unstructured data (text, image, audio, sensor data) for AI-ready datasets.
• Implement data labeling, versioning, and lineage tracking for reproducible AI experiments.
• Optimize performance for large-scale distributed training on Spark, Databricks, or Ray.
• Ensure data quality, compliance, and governance for AI systems under SOC 2 / GDPR / HIPAA frameworks.
• Partner with architects to design cloud-native AI infrastructure (AWS SageMaker, Azure AI, GCP Vertex AI).
Required Skills & Experience:
• 7 + years in Data Engineering, with 3 + years focused on AI/ML data workflows.
• Expert in Python (pandas, PySpark, FastAPI) and SQL.
• Hands-on with data orchestration (Airflow, Prefect, Dagster) and ETL tools (Databricks, Glue, Dataflow).
• Proficient in cloud AI services – AWS SageMaker, Azure Machine Learning, or GCP Vertex AI.
• Experience with vector databases (FAISS, Pinecone, ChromaDB) and embedding pipelines (OpenAI API, LangChain).
• Knowledge of model data lifecycle: training → evaluation → deployment → monitoring.
• Solid understanding of ML concepts: supervised/unsupervised learning, transformers, and LLM fine-tuning.
• Experience in MLOps and CI/CD for AI models (Docker, Kubernetes, GitHub Actions, Terraform).
• Strong command of data privacy, security, and governance for AI datasets.
Preferred Skills:
• Familiarity with Gen-AI tools (LangChain, LLamaIndex, OpenAI API, Anthropic Claude, Hugging Face).
• Knowledge of data retrieval + prompt-engineering pipelines (RAG).
• Experience integrating LLM applications with production APIs.
• Cloud certifications in AI/ML or Data Engineering.
AI Data Engineer
Mode of work: Hybrid
Location: Lansing, Michigan
Job Summary:
We are looking for a Senior AI Data Engineer with strong expertise in AI/ML data pipelines, model data preparation, vector databases, and cloud-based AI infrastructure. The candidate will architect, build, and optimize the data ecosystem that powers machine learning, generative AI, and LLM applications. You’ll collaborate closely with Data Scientists, ML Engineers, and Cloud Architects to ensure high-quality, production-ready data for AI models.
Core Responsibilities:
• Design and build end-to-end AI data pipelines for training, fine-tuning, and inference of ML/Gen-AI models.
• Create data ingestion, transformation, and feature-engineering workflows optimized for large-scale model training.
• Develop and manage data lakes, feature stores, and vector databases (e.g., Pinecone, Weaviate, FAISS).
• Collaborate with ML Engineers to operationalize model deployment pipelines (using MLflow, Kubeflow, or SageMaker Pipelines).
• Integrate unstructured data (text, image, audio, sensor data) for AI-ready datasets.
• Implement data labeling, versioning, and lineage tracking for reproducible AI experiments.
• Optimize performance for large-scale distributed training on Spark, Databricks, or Ray.
• Ensure data quality, compliance, and governance for AI systems under SOC 2 / GDPR / HIPAA frameworks.
• Partner with architects to design cloud-native AI infrastructure (AWS SageMaker, Azure AI, GCP Vertex AI).
Required Skills & Experience:
• 7 + years in Data Engineering, with 3 + years focused on AI/ML data workflows.
• Expert in Python (pandas, PySpark, FastAPI) and SQL.
• Hands-on with data orchestration (Airflow, Prefect, Dagster) and ETL tools (Databricks, Glue, Dataflow).
• Proficient in cloud AI services – AWS SageMaker, Azure Machine Learning, or GCP Vertex AI.
• Experience with vector databases (FAISS, Pinecone, ChromaDB) and embedding pipelines (OpenAI API, LangChain).
• Knowledge of model data lifecycle: training → evaluation → deployment → monitoring.
• Solid understanding of ML concepts: supervised/unsupervised learning, transformers, and LLM fine-tuning.
• Experience in MLOps and CI/CD for AI models (Docker, Kubernetes, GitHub Actions, Terraform).
• Strong command of data privacy, security, and governance for AI datasets.
Preferred Skills:
• Familiarity with Gen-AI tools (LangChain, LLamaIndex, OpenAI API, Anthropic Claude, Hugging Face).
• Knowledge of data retrieval + prompt-engineering pipelines (RAG).
• Experience integrating LLM applications with production APIs.
• Cloud certifications in AI/ML or Data Engineering.






