ProSearch

AI Data Engineer

⭐ - Featured Role | Apply direct with Data Freelance Hub

This role is for an AI Data Engineer, a fully remote position with a contract length of unspecified duration. The pay rate is also unspecified. Key skills include advanced SQL, Python, and R, with a requirement of at least four years of Data Engineering experience.

🌎 - Country

United States

💱 - Currency

$ USD

💰 - Day rate

Unknown

🗓️ - Date

November 18, 2025

🕒 - Duration

Unknown

🏝️ - Location

Remote

📄 - Contract

Unknown

🔒 - Security

Unknown

📍 - Location detailed

United States

🧠 - Skills detailed

#PySpark #Programming #Azure #Hadoop #Terraform #Data Science #SQL (Structured Query Language) #Kubernetes #GIT #AWS (Amazon Web Services) #Data Lake #Data Modeling #AI (Artificial Intelligence) #ML Ops (Machine Learning Operations) #Python #"ETL (Extract #Transform #Load)" #Cloud #Data Lakehouse #NoSQL #Databricks #Delta Lake #Big Data #dbt (data build tool) #Storage #Scala #Monitoring #Docker #Disaster Recovery #Data Architecture #Metadata #Data Governance #PyTorch #Spark (Apache Spark) #Data Quality #Data Engineering #Data Warehouse #Documentation #R #Model Validation #Infrastructure as Code (IaC) #S3 (Amazon Simple Storage Service) #YAML (YAML Ain't Markup Language) #ML (Machine Learning) #Data Pipeline #GitHub #Data Analysis #Databases #Datasets #Snowflake #TensorFlow

Role description

We have partnered with a leading technology research organization to hire an AI Data Engineer. In this role, you will build scalable data pipelines, partner closely with Data Scientists and ML Engineers, and ensure the organization’s AI/ML models are fueled by high-quality, well-structured data. This is a fully remote opportunity to contribute to impactful AI initiatives that support scientific innovation, clinical solutions, and operational excellence. About the Role As an AI Data Engineer, you will support data science model validation, analytics workloads, and machine learning operations by building high-quality feature tables, analytical datasets, and automated workflows. You’ll collaborate with senior data staff, product owners, and AI/ML scientists to deliver reliable data assets that enhance model performance and accelerate R&D innovation. You will work across core data streams, including discovery, imaging, clinical, and operational, and contribute to the pipelines that power next-generation AI products in veterinary and animal health. Top Required Skills • SQL (advanced) • Python • R Nice-to-Have Skills • dbt Core • Databricks • Data analysis experience Technology Stack Python • Databricks • dbt Core • Hadoop • TensorFlow • PyTorch • PySpark • Snowflake • AWS What You’ll Do • Build scalable, reliable, distributed data pipelines to support machine learning operations and analytics workloads. • Partner with data scientists, ML engineers, analysts, and data product owners to understand requirements and deliver high-quality solutions. • Work with modern cloud and ML stacks, including Databricks, Snowflake, AWS, and Azure. • Use Databricks (pipelines, workflows, asset bundles) to streamline engineering processes. • Apply dbt Core for transformations, documentation, testing, and semantic consistency. • Maintain code quality using SQL/YAML linters (SQLFluff) and enforce standards through GitHub Actions CI/CD. • Develop solutions for data quality issues such as missing, duplicate, and inconsistent data. • Contribute to data warehouse, data lake, data lakehouse, and data mesh architectural patterns. • Build pipelines in Python to integrate diverse data types: structured tables, text documents, images, and more. • Implement CI/CD systems and IaC tools like Terraform or AWS CloudFormation. • Support data systems across the full lifecycle: exploration, production, monitoring, disaster recovery, and optimization. • Stay current on advanced data engineering practices, including emerging technologies like Generative AI. What You Bring You have a relevant technical degree and at least four (4) years of Data Engineering experience. You are experienced with: • Cloud platforms (preferably AWS) • Big data technologies: Spark, Databricks, Delta Lake • Git and Git-based workflows • dbt Core and modern data modeling • SQL and NoSQL databases • Cloud object storage (e.g., S3) • Containerization (Docker, Kubernetes, AWS ECS) • Building, testing, and maintaining fault-tolerant data pipelines • Understanding data architecture concepts: warehouse, lake, lakehouse, mesh You’re also eager to deepen your knowledge of AI/ML techniques, and it’s a plus if you have: • Experience developing APIs or web applications • Certifications in data engineering or AI/ML Leveling Guide (Intermediate) • Build metadata and schemas based on logical models • Write scripts for physical data layout and load test data • Design and validate schemas • Use ER modeling tools for intermediate tasks • Adhere to data governance, naming conventions, testing principles • Resolve moderately complex data problems • Provide SQL and Python scripts for tuning and validation • Write intermediate-level database programming scripts • Contribute independently to team projects and semantic layer enhancements • Suggest improvements to standards and processes • Take new perspectives on solving moderately complex problems Why This Role Matters Your work will directly impact: • The performance of AI/ML models • The accuracy, reliability, and timeliness of analytics • The innovation of new data streams from R&D pipelines • The quality and discoverability of curated datasets • How the organization advances clinical AI technologies Join Us If you are an analytical, collaborative, and forward-thinking AI Data Engineer looking for a remote opportunity that combines modern data engineering with applied machine learning, we encourage you to apply. Your expertise will help shape the next generation of AI-driven products and scientific innovation.

Apply now Apply with DFH Sign up

← See all roles