

Perficient
Lead Data Engineer
⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Lead Data Engineer on a contract basis, focusing on modernizing healthcare data pipelines from SAS to Python/PySpark on Databricks. Required skills include Python, AWS, SQL, and data pipeline management. Experience in healthcare data is preferred.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
Unknown
-
🗓️ - Date
June 18, 2026
🕒 - Duration
Unknown
-
🏝️ - Location
Unknown
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
Greater Minneapolis-St. Paul Area
-
🧠 - Skills detailed
#Scripting #Python #EC2 #Automation #AI (Artificial Intelligence) #SAS #Version Control #Data Extraction #AWS (Amazon Web Services) #Data Pipeline #SQL (Structured Query Language) #Spark (Apache Spark) #Databricks #Datasets #GIT #ML (Machine Learning) #Scala #S3 (Amazon Simple Storage Service) #Data Engineering #"ETL (Extract #Transform #Load)" #Agile #Data Quality #PySpark #Lambda (AWS Lambda)
Role description
We are seeking a Senior Data Engineer (open to Principal level) to lead the modernization and ownership of a critical data pipeline within a large-scale healthcare analytics environment. This role will focus on transitioning legacy SAS-based pipelines to modern Python/PySpark on Databricks, while driving engineering best practices and scalable data solutions.
This is a hands-on engineering role with a strong emphasis on development capabilities. Candidates with application development experience and exposure to AI/automation technologies will stand out.
Key Responsibilities
• Lead the modernization of data pipelines from SAS to Python/PySpark on Databricks
• Own and evolve a mission-critical HEDIS data pipeline used for performance measurement and reporting
• Design, build, and optimize scalable data pipelines in a distributed environment
• Collaborate with SMEs during an initial knowledge transfer period, with eventual full pipeline ownership
• Develop, schedule, and automate end-to-end data workflows
• Ensure data quality, reliability, and performance across large datasets
• Partner with cross-functional teams and analytics vendors to deliver high-quality data outputs
• Contribute to best practices in version control, CI/CD, and agile development workflows
Required Qualifications
• Strong development/engineering background (core requirement)
• Hands-on experience with Python (scripting and application development)
• Expertise in building and managing data pipelines and ETL workflows
• Experience processing large-scale datasets in distributed environments
• Proficiency with Databricks (notebooks, workflows, cluster management)
• Solid experience with AWS services including S3, Lambda, Glue, and EC2
• Strong SQL skills for complex transformations and data extraction
• Experience with pipeline orchestration and automation
• Familiarity with version control systems (Git) in a collaborative environment
• Experience managing work via issues, epics, and agile tooling
Preferred / Nice-to-Have
• Experience with AI, machine learning, or automation frameworks
• Exposure to healthcare data (e.g., HEDIS)
• Background in transitioning legacy systems to modern data platforms
What are we Looking For (Priority Order)
1. Strong development engineering capabilities (must-have)
1. Application development experience, especially Python scripting
1. Expertise in AI or automation (highly desirable bonus)
We are seeking a Senior Data Engineer (open to Principal level) to lead the modernization and ownership of a critical data pipeline within a large-scale healthcare analytics environment. This role will focus on transitioning legacy SAS-based pipelines to modern Python/PySpark on Databricks, while driving engineering best practices and scalable data solutions.
This is a hands-on engineering role with a strong emphasis on development capabilities. Candidates with application development experience and exposure to AI/automation technologies will stand out.
Key Responsibilities
• Lead the modernization of data pipelines from SAS to Python/PySpark on Databricks
• Own and evolve a mission-critical HEDIS data pipeline used for performance measurement and reporting
• Design, build, and optimize scalable data pipelines in a distributed environment
• Collaborate with SMEs during an initial knowledge transfer period, with eventual full pipeline ownership
• Develop, schedule, and automate end-to-end data workflows
• Ensure data quality, reliability, and performance across large datasets
• Partner with cross-functional teams and analytics vendors to deliver high-quality data outputs
• Contribute to best practices in version control, CI/CD, and agile development workflows
Required Qualifications
• Strong development/engineering background (core requirement)
• Hands-on experience with Python (scripting and application development)
• Expertise in building and managing data pipelines and ETL workflows
• Experience processing large-scale datasets in distributed environments
• Proficiency with Databricks (notebooks, workflows, cluster management)
• Solid experience with AWS services including S3, Lambda, Glue, and EC2
• Strong SQL skills for complex transformations and data extraction
• Experience with pipeline orchestration and automation
• Familiarity with version control systems (Git) in a collaborative environment
• Experience managing work via issues, epics, and agile tooling
Preferred / Nice-to-Have
• Experience with AI, machine learning, or automation frameworks
• Exposure to healthcare data (e.g., HEDIS)
• Background in transitioning legacy systems to modern data platforms
What are we Looking For (Priority Order)
1. Strong development engineering capabilities (must-have)
1. Application development experience, especially Python scripting
1. Expertise in AI or automation (highly desirable bonus)






