MAK Technologies LLC

Data Engineer AI Systems

⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Data Engineer - AI Systems, a 6-month onsite position in St. Louis, MO, offering a competitive pay rate. Key skills include Databricks, Python, PySpark, and experience with ETL development and unstructured data processing.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
Unknown
-
🗓️ - Date
December 12, 2025
🕒 - Duration
More than 6 months
-
🏝️ - Location
On-site
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
St Louis, MO
-
🧠 - Skills detailed
#JSON (JavaScript Object Notation) #Data Modeling #Data Lineage #Scala #REST (Representational State Transfer) #Data Pipeline #Data Governance #API (Application Programming Interface) #Automation #Fivetran #Observability #REST API #Spark (Apache Spark) #Data Integration #Datasets #Data Accuracy #Data Quality #Python #Databases #Databricks #ML (Machine Learning) #"ETL (Extract #Transform #Load)" #SQL (Structured Query Language) #Langchain #AI (Artificial Intelligence) #PySpark #Delta Lake #Version Control #Data Engineering #Data Processing
Role description
Job Title: Data Engineer - AI Systems 6 Months St. Louis, MO Day 1 onsite role Data Engineer – AI Systems (Databricks) Primary Skills: Data Engineer, Databricks, Python, PySpark, AI/ML We’rebuilding intelligent, Databricks-powered AI systems that structure and activate information from diverse enterprise sources (Confluence, OneDrive, PDFs, andmore). As a Data Engineer, you’ll design and optimize the data pipelinesthat transform raw and unstructured content into clean, AI-ready datasets formachine learning and generative AI agents. You’llcollaborate with a cross-functional team of Machine Learning Engineers,Software Developers, and domain experts to create high-quality data foundationsthat power Databricks-native AI agents and retrieval systems. KeyResponsibilities • Develop Scalable Pipelines: Design, build, and maintain high-performance ETL and ELT workflows using Databricks, PySpark, and Delta Lake. • Data Integration: Build APIs and connectors to ingest data from collaboration platforms such as Confluence, OneDrive, and other enterprise systems. • Unstructured Data Handling: Implement extraction and transformation pipelines for text, PDFs, and scanned documents using Databricks OCR and related tools. • Data Modeling: Design Delta Lake and Unity Catalog data models for both structured and vectorized (embedding-based) data stores. • Data Quality & Observability: Apply validation, version control, and quality checks to ensure pipeline reliability and data accuracy. • Collaboration: Work closely with ML Engineers to prepare datasets for LLM fine-tuning and vector database creation, and with Software Engineers to deliver end-to-end data services. • Performance & Automation: Optimize workflows for scale and automation, leveraging Databricks Jobs, Workflows, and CI/CD best practices. What YouBring • Experience with data engineering, ETL development, or data pipeline automation. • Proficiency in Python, SQL, and PySpark. • Hands-on experience with Databricks, Spark, and Delta Lake. • Familiarity with data APIs, JSON, and unstructured data processing (OCR, text extraction). • Understanding of data versioning, schema evolution, and data lineage concepts. • Interest in AI/ML data pipelines, vector databases, and intelligent data systems. BonusSkills • Experience with vector databases (e.g., Pinecone, Chroma, FAISS) or Databricks’ Vector Search. • Exposure to LLM-based architectures, LangChain, or Databricks Mosaic AI. • Knowledge of data governance frameworks, Unity Catalog, or access control best practices. Familiarity with REST API development or data synchronization services (e.g., Airbyte, Fivetran, custom connectors