

Centraprise
Sr. Data Engineer (ETL, PySpark, AWS)
β - Featured Role | Apply direct with Data Freelance Hub
This role is for a Sr. Data Engineer (ETL, PySpark, AWS) on a long-term remote contract, offering competitive pay. Key skills include Python, PySpark, AWS Glue, and healthcare data experience. Strong knowledge of ETL/ELT pipeline design and CI/CD is required.
π - Country
United States
π± - Currency
$ USD
-
π° - Day rate
Unknown
-
ποΈ - Date
October 16, 2025
π - Duration
Unknown
-
ποΈ - Location
Remote
-
π - Contract
Unknown
-
π - Security
Unknown
-
π - Location detailed
United States
-
π§ - Skills detailed
#Data Lake #Linux #AWS Glue #Scala #Data Modeling #Datasets #DataOps #Apache Spark #Cloud #Data Pipeline #Spark (Apache Spark) #Data Quality #Delta Lake #Data Lakehouse #Python #Batch #IAM (Identity and Access Management) #Kafka (Apache Kafka) #Version Control #Unix #Terraform #DevOps #Data Engineering #"ETL (Extract #Transform #Load)" #Infrastructure as Code (IaC) #Lambda (AWS Lambda) #Storage #S3 (Amazon Simple Storage Service) #PySpark #Airflow #SQL (Structured Query Language) #AWS (Amazon Web Services) #Apache Airflow #Data Processing #Automation #GitHub #GIT #Normalization
Role description
Sr. Data Engineer (ETL, PySpark, AWS)
Remote
Long term
Job Description:
β’ We are looking for a Senior Data Engineer to design, build, and optimize large-scale data processing systems supporting healthcare analytics and operational reporting.
β’ This role will involve working closely with DataOps, DevOps, and QA teams to enable scalable and reliable data pipelines.
Key Responsibilities:
β’ Design and implement ETL/ELT pipelines using Python and PySpark.
β’ Develop scalable data workflows using Apache Spark and AWS Glue.
β’ Collaborate with QA and DevOps to integrate CI/CD and testing automation.
β’ Manage data lake structures and ensure data quality, lineage, and auditability.
β’ Optimize and monitor performance of batch and streaming pipelines.
β’ Build infrastructure as code (IaC) using tools like Terraform, GitHub Actions.
β’ Work across structured, semi-structured, and unstructured healthcare datasets.
Required Technical Skills:
Core & Deep Knowledge Assessment:
β’ Python
β’ PySpark
β’ SQL (including Window functions and CASE)
β’ AWS Glue, S3, Lambda
β’ Apache Spark
β’ Apache Airflow
β’ Delta Lake/Data Lakehouse Architecture
β’ CI/CD (Terraform, GitHub Actions)
β’ ETL/ELT pipeline design and optimization
Basic Overall Knowledge Assessment:
β’ Kafka
β’ Data modeling and normalization
β’ Unix/Linux
β’ Infrastructure as Code (IaC)
β’ Cloud storage, IAM, and networking fundamentals (AWS)
β’ Git version control
β’ Healthcare data domain knowledge
Sr. Data Engineer (ETL, PySpark, AWS)
Remote
Long term
Job Description:
β’ We are looking for a Senior Data Engineer to design, build, and optimize large-scale data processing systems supporting healthcare analytics and operational reporting.
β’ This role will involve working closely with DataOps, DevOps, and QA teams to enable scalable and reliable data pipelines.
Key Responsibilities:
β’ Design and implement ETL/ELT pipelines using Python and PySpark.
β’ Develop scalable data workflows using Apache Spark and AWS Glue.
β’ Collaborate with QA and DevOps to integrate CI/CD and testing automation.
β’ Manage data lake structures and ensure data quality, lineage, and auditability.
β’ Optimize and monitor performance of batch and streaming pipelines.
β’ Build infrastructure as code (IaC) using tools like Terraform, GitHub Actions.
β’ Work across structured, semi-structured, and unstructured healthcare datasets.
Required Technical Skills:
Core & Deep Knowledge Assessment:
β’ Python
β’ PySpark
β’ SQL (including Window functions and CASE)
β’ AWS Glue, S3, Lambda
β’ Apache Spark
β’ Apache Airflow
β’ Delta Lake/Data Lakehouse Architecture
β’ CI/CD (Terraform, GitHub Actions)
β’ ETL/ELT pipeline design and optimization
Basic Overall Knowledge Assessment:
β’ Kafka
β’ Data modeling and normalization
β’ Unix/Linux
β’ Infrastructure as Code (IaC)
β’ Cloud storage, IAM, and networking fundamentals (AWS)
β’ Git version control
β’ Healthcare data domain knowledge