

Brooksource
Lead Data Engineer – AI Data Products
⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Lead Data Engineer – AI Data Products, offering a contract-to-permanent position. It is 100% remote with a pay rate of "unknown". Key skills include Databricks, Spark, PySpark, and CI/CD. Requires strong data pipeline experience and technical leadership.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
Unknown
-
🗓️ - Date
May 14, 2026
🕒 - Duration
Unknown
-
🏝️ - Location
Remote
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
St Louis, MO
-
🧠 - Skills detailed
#Code Reviews #GitHub #AI (Artificial Intelligence) #Data Warehouse #DataOps #Data Pipeline #Spark (Apache Spark) #Leadership #Scripting #Linux #GIT #PySpark #Automation #Distributed Computing #Python #"ETL (Extract #Transform #Load)" #SQL (Structured Query Language) #ML (Machine Learning) #Data Analysis #GitLab #Scala #Databricks #Data Processing #Hadoop #DevOps #Data Engineering #Batch
Role description
Lead Data Engineer – AI Data Products
Contract-to-Permanent Hire
100% Remote (8AM-5PM CST)
Our Fortune 50 healthcare client’s AI/ML platforms group is seeking a modern Lead Data Engineer to provide technical leadership and delivery oversight across multiple AI data products within their enterprise AI Hub. This role is primarily focused on technical direction, architectural guidance, and team leadership (~75%), while remaining hands-on (~25%) in building scalable data pipelines, CI/CD automation, and AI-enabling data assets across multiple concurrent initiatives.
Responsibilities:
• Provide technical leadership across multiple AI Data Product initiatives and engineering workstreams.
• Understand and clarify technical requirements, recommend architecture/design elements, and set overall technical direction across projects.
• Design, implement, and maintain scalable ETL/ELT pipelines and distributed data workflows using Databricks/Spark technologies.
• Implement and optimize CI/CD pipelines, data operations workflows, and cost management strategies across the data platform.
• Build and support AI-enabling data assets such as vector stores, feature tables, Genie Rooms, and semantic AI context assets, while ensuring integration into model development workflows.
• Partner with AI/ML, analytics, platform, and business teams to deliver production-grade data solutions.
• Support platform visibility by delivering operational insights into platform utilization, cost trends, and financial operations.
• Oversee and support Junior-Senior Engineers through POCs, technical guidance, troubleshooting, and code reviews.
Requirements:
• Strong hands-on experience with Databricks Data Engineering and Spark distributed computing. Hadoop ecosystem experience is a plus.
• PySpark and Python expertise for large-scale data processing.
• Strong SQL skills and experience with data warehouses and data analysis.
• Hands-on experience building data pipelines (batch and streaming).
• Experience working with columnar data formats (Parquet, Delta).
• Experience with DevOps practices, CI/CD pipeline development, and Git workflows (GitHub/GitLab).
• Familiarity with Linux scripting fundamentals (for pipeline and CI/CD automation).
• Exposure to emerging AI data infrastructure, such as building vector stores and applying DataOps / MLOps practices.
• Technical leadership across multiple concurrent projects, providing architectural guidance, defining technical work, and setting technical direction.
• US Citizens & Green Card holders only
•
Lead Data Engineer – AI Data Products
Contract-to-Permanent Hire
100% Remote (8AM-5PM CST)
Our Fortune 50 healthcare client’s AI/ML platforms group is seeking a modern Lead Data Engineer to provide technical leadership and delivery oversight across multiple AI data products within their enterprise AI Hub. This role is primarily focused on technical direction, architectural guidance, and team leadership (~75%), while remaining hands-on (~25%) in building scalable data pipelines, CI/CD automation, and AI-enabling data assets across multiple concurrent initiatives.
Responsibilities:
• Provide technical leadership across multiple AI Data Product initiatives and engineering workstreams.
• Understand and clarify technical requirements, recommend architecture/design elements, and set overall technical direction across projects.
• Design, implement, and maintain scalable ETL/ELT pipelines and distributed data workflows using Databricks/Spark technologies.
• Implement and optimize CI/CD pipelines, data operations workflows, and cost management strategies across the data platform.
• Build and support AI-enabling data assets such as vector stores, feature tables, Genie Rooms, and semantic AI context assets, while ensuring integration into model development workflows.
• Partner with AI/ML, analytics, platform, and business teams to deliver production-grade data solutions.
• Support platform visibility by delivering operational insights into platform utilization, cost trends, and financial operations.
• Oversee and support Junior-Senior Engineers through POCs, technical guidance, troubleshooting, and code reviews.
Requirements:
• Strong hands-on experience with Databricks Data Engineering and Spark distributed computing. Hadoop ecosystem experience is a plus.
• PySpark and Python expertise for large-scale data processing.
• Strong SQL skills and experience with data warehouses and data analysis.
• Hands-on experience building data pipelines (batch and streaming).
• Experience working with columnar data formats (Parquet, Delta).
• Experience with DevOps practices, CI/CD pipeline development, and Git workflows (GitHub/GitLab).
• Familiarity with Linux scripting fundamentals (for pipeline and CI/CD automation).
• Exposure to emerging AI data infrastructure, such as building vector stores and applying DataOps / MLOps practices.
• Technical leadership across multiple concurrent projects, providing architectural guidance, defining technical work, and setting technical direction.
• US Citizens & Green Card holders only
•






