

MAK Technologies LLC
Data Engineer AI Systems
⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Data Engineer - AI Systems, a 6-month onsite position in St. Louis, MO, offering a competitive pay rate. Key skills include Databricks, Python, PySpark, and experience with ETL development and unstructured data processing.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
Unknown
-
🗓️ - Date
December 12, 2025
🕒 - Duration
More than 6 months
-
🏝️ - Location
On-site
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
St Louis, MO
-
🧠 - Skills detailed
#JSON (JavaScript Object Notation) #Data Modeling #Data Lineage #Scala #REST (Representational State Transfer) #Data Pipeline #Data Governance #API (Application Programming Interface) #Automation #Fivetran #Observability #REST API #Spark (Apache Spark) #Data Integration #Datasets #Data Accuracy #Data Quality #Python #Databases #Databricks #ML (Machine Learning) #"ETL (Extract #Transform #Load)" #SQL (Structured Query Language) #Langchain #AI (Artificial Intelligence) #PySpark #Delta Lake #Version Control #Data Engineering #Data Processing
Role description
Job Title: Data Engineer - AI Systems
6 Months
St. Louis, MO Day 1 onsite role
Data Engineer – AI Systems (Databricks)
Primary Skills: Data Engineer, Databricks, Python, PySpark, AI/ML
We’rebuilding intelligent, Databricks-powered AI systems that structure and activate information from diverse enterprise sources (Confluence, OneDrive, PDFs, andmore). As a Data Engineer, you’ll design and optimize the data pipelinesthat transform raw and unstructured content into clean, AI-ready datasets formachine learning and generative AI agents.
You’llcollaborate with a cross-functional team of Machine Learning Engineers,Software Developers, and domain experts to create high-quality data foundationsthat power Databricks-native AI agents and retrieval systems.
KeyResponsibilities
• Develop Scalable Pipelines: Design, build, and maintain high-performance ETL and ELT workflows using Databricks, PySpark, and Delta Lake.
• Data Integration: Build APIs and connectors to ingest data from collaboration platforms such as Confluence, OneDrive, and other enterprise systems.
• Unstructured Data Handling: Implement extraction and transformation pipelines for text, PDFs, and scanned documents using Databricks OCR and related tools.
• Data Modeling: Design Delta Lake and Unity Catalog data models for both structured and vectorized (embedding-based) data stores.
• Data Quality & Observability: Apply validation, version control, and quality checks to ensure pipeline reliability and data accuracy.
• Collaboration: Work closely with ML Engineers to prepare datasets for LLM fine-tuning and vector database creation, and with Software Engineers to deliver end-to-end data services.
• Performance & Automation: Optimize workflows for scale and automation, leveraging Databricks Jobs, Workflows, and CI/CD best practices.
What YouBring
• Experience with data engineering, ETL development, or data pipeline automation.
• Proficiency in Python, SQL, and PySpark.
• Hands-on experience with Databricks, Spark, and Delta Lake.
• Familiarity with data APIs, JSON, and unstructured data processing (OCR, text extraction).
• Understanding of data versioning, schema evolution, and data lineage concepts.
• Interest in AI/ML data pipelines, vector databases, and intelligent data systems.
BonusSkills
• Experience with vector databases (e.g., Pinecone, Chroma, FAISS) or Databricks’ Vector Search.
• Exposure to LLM-based architectures, LangChain, or Databricks Mosaic AI.
• Knowledge of data governance frameworks, Unity Catalog, or access control best practices.
Familiarity with REST API development or data synchronization services (e.g., Airbyte, Fivetran, custom connectors
Job Title: Data Engineer - AI Systems
6 Months
St. Louis, MO Day 1 onsite role
Data Engineer – AI Systems (Databricks)
Primary Skills: Data Engineer, Databricks, Python, PySpark, AI/ML
We’rebuilding intelligent, Databricks-powered AI systems that structure and activate information from diverse enterprise sources (Confluence, OneDrive, PDFs, andmore). As a Data Engineer, you’ll design and optimize the data pipelinesthat transform raw and unstructured content into clean, AI-ready datasets formachine learning and generative AI agents.
You’llcollaborate with a cross-functional team of Machine Learning Engineers,Software Developers, and domain experts to create high-quality data foundationsthat power Databricks-native AI agents and retrieval systems.
KeyResponsibilities
• Develop Scalable Pipelines: Design, build, and maintain high-performance ETL and ELT workflows using Databricks, PySpark, and Delta Lake.
• Data Integration: Build APIs and connectors to ingest data from collaboration platforms such as Confluence, OneDrive, and other enterprise systems.
• Unstructured Data Handling: Implement extraction and transformation pipelines for text, PDFs, and scanned documents using Databricks OCR and related tools.
• Data Modeling: Design Delta Lake and Unity Catalog data models for both structured and vectorized (embedding-based) data stores.
• Data Quality & Observability: Apply validation, version control, and quality checks to ensure pipeline reliability and data accuracy.
• Collaboration: Work closely with ML Engineers to prepare datasets for LLM fine-tuning and vector database creation, and with Software Engineers to deliver end-to-end data services.
• Performance & Automation: Optimize workflows for scale and automation, leveraging Databricks Jobs, Workflows, and CI/CD best practices.
What YouBring
• Experience with data engineering, ETL development, or data pipeline automation.
• Proficiency in Python, SQL, and PySpark.
• Hands-on experience with Databricks, Spark, and Delta Lake.
• Familiarity with data APIs, JSON, and unstructured data processing (OCR, text extraction).
• Understanding of data versioning, schema evolution, and data lineage concepts.
• Interest in AI/ML data pipelines, vector databases, and intelligent data systems.
BonusSkills
• Experience with vector databases (e.g., Pinecone, Chroma, FAISS) or Databricks’ Vector Search.
• Exposure to LLM-based architectures, LangChain, or Databricks Mosaic AI.
• Knowledge of data governance frameworks, Unity Catalog, or access control best practices.
Familiarity with REST API development or data synchronization services (e.g., Airbyte, Fivetran, custom connectors






