Centraprise

Sr. Data Engineer (ETL, PySpark, AWS)

⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Sr. Data Engineer (ETL, PySpark, AWS) on a long-term remote contract, offering competitive pay. Key skills include Python, PySpark, AWS Glue, and healthcare data experience. Strong knowledge of ETL/ELT pipeline design and CI/CD is required.
🌎 - Country
United States
πŸ’± - Currency
$ USD
-
πŸ’° - Day rate
Unknown
-
πŸ—“οΈ - Date
October 16, 2025
πŸ•’ - Duration
Unknown
-
🏝️ - Location
Remote
-
πŸ“„ - Contract
Unknown
-
πŸ”’ - Security
Unknown
-
πŸ“ - Location detailed
United States
-
🧠 - Skills detailed
#Data Lake #Linux #AWS Glue #Scala #Data Modeling #Datasets #DataOps #Apache Spark #Cloud #Data Pipeline #Spark (Apache Spark) #Data Quality #Delta Lake #Data Lakehouse #Python #Batch #IAM (Identity and Access Management) #Kafka (Apache Kafka) #Version Control #Unix #Terraform #DevOps #Data Engineering #"ETL (Extract #Transform #Load)" #Infrastructure as Code (IaC) #Lambda (AWS Lambda) #Storage #S3 (Amazon Simple Storage Service) #PySpark #Airflow #SQL (Structured Query Language) #AWS (Amazon Web Services) #Apache Airflow #Data Processing #Automation #GitHub #GIT #Normalization
Role description
Sr. Data Engineer (ETL, PySpark, AWS) Remote Long term Job Description: β€’ We are looking for a Senior Data Engineer to design, build, and optimize large-scale data processing systems supporting healthcare analytics and operational reporting. β€’ This role will involve working closely with DataOps, DevOps, and QA teams to enable scalable and reliable data pipelines. Key Responsibilities: β€’ Design and implement ETL/ELT pipelines using Python and PySpark. β€’ Develop scalable data workflows using Apache Spark and AWS Glue. β€’ Collaborate with QA and DevOps to integrate CI/CD and testing automation. β€’ Manage data lake structures and ensure data quality, lineage, and auditability. β€’ Optimize and monitor performance of batch and streaming pipelines. β€’ Build infrastructure as code (IaC) using tools like Terraform, GitHub Actions. β€’ Work across structured, semi-structured, and unstructured healthcare datasets. Required Technical Skills: Core & Deep Knowledge Assessment: β€’ Python β€’ PySpark β€’ SQL (including Window functions and CASE) β€’ AWS Glue, S3, Lambda β€’ Apache Spark β€’ Apache Airflow β€’ Delta Lake/Data Lakehouse Architecture β€’ CI/CD (Terraform, GitHub Actions) β€’ ETL/ELT pipeline design and optimization Basic Overall Knowledge Assessment: β€’ Kafka β€’ Data modeling and normalization β€’ Unix/Linux β€’ Infrastructure as Code (IaC) β€’ Cloud storage, IAM, and networking fundamentals (AWS) β€’ Git version control β€’ Healthcare data domain knowledge