

Sigmaways Inc
Senior Data Engineer - Spark, Airflow
β - Featured Role | Apply direct with Data Freelance Hub
This role is for a Senior Data Engineer with a contract length of "unknown," offering a pay rate of "unknown." Key skills required include Apache Spark, Airflow, Python, and experience in big data development. A bachelor's degree and 7 years of relevant experience are mandatory.
π - Country
United States
π± - Currency
$ USD
-
π° - Day rate
Unknown
-
ποΈ - Date
November 14, 2025
π - Duration
Unknown
-
ποΈ - Location
Unknown
-
π - Contract
Unknown
-
π - Security
Unknown
-
π - Location detailed
San Francisco Bay Area
-
π§ - Skills detailed
#Monitoring #Shell Scripting #Databricks #Big Data #Data Quality #Scripting #Data Lineage #Programming #Data Processing #Data Governance #Deployment #Airflow #Spark (Apache Spark) #Kubernetes #Docker #PySpark #Data Engineering #Apache Spark #Data Pipeline #Automation #"ETL (Extract #Transform #Load)" #Computer Science #Debugging #Python #AWS Glue #DevOps #AWS (Amazon Web Services) #Scala
Role description
We are seeking an experienced Data Engineer to design and optimize scalable data pipelines that drive our global data and analytics initiatives.
In this role, you will leverage technologies such as Apache Spark, Airflow, and Python to build high performance data processing systems and ensure data quality, reliability, and lineage across Mastercardβs data ecosystem.
The ideal candidate combines strong technical expertise with hands-on experience in distributed data systems, workflow automation, and performance tuning to deliver impactful, data-driven solutions at enterprise scale.
Responsibilities:
β’ Design and optimize Spark-based ETL pipelines for large-scale data processing.
β’ Build and manage Airflow DAGs for scheduling, orchestration, and checkpointing.
β’ Implement partitioning and shuffling strategies to improve Spark performance.
β’ Ensure data lineage, quality, and traceability across systems.
β’ Develop Python scripts for data transformation, aggregation, and validation.
β’ Execute and tune Spark jobs using spark-submit.
β’ Perform DataFrame joins and aggregations for analytical insights.
β’ Automate multi-step processes through shell scripting and variable management.
β’ Collaborate with data, DevOps, and analytics teams to deliver scalable data solutions.
Qualifications:
β’ Bachelorβs degree in Computer Science, Data Engineering, or related field (or equivalent experience).
β’ At least 7 years of experience in data engineering or big data development.
β’ Strong expertise in Apache Spark architecture, optimization, and job configuration.
β’ Proven experience with Airflow DAGs using authoring, scheduling, checkpointing, monitoring.
β’ Skilled in data shuffling, partitioning strategies, and performance tuning in distributed systems.
β’ Expertise in Python programming including data structures and algorithmic problem-solving.
β’ Hands-on with Spark DataFrames and PySpark transformations using joins, aggregations, filters.
β’ Proficient in shell scripting, including managing and passing variables between scripts.
β’ Experienced with spark submit for deployment and tuning.
β’ Solid understanding of ETL design, workflow automation, and distributed data systems.
β’ Excellent debugging and problem-solving skills in large-scale environments.
β’ Experience with AWS Glue, EMR, Databricks, or similar Spark platforms.
β’ Knowledge of data lineage and data quality frameworks like Apache Atlas.
β’ Familiarity with CI/CD pipelines, Docker/Kubernetes, and data governance tools.
We are seeking an experienced Data Engineer to design and optimize scalable data pipelines that drive our global data and analytics initiatives.
In this role, you will leverage technologies such as Apache Spark, Airflow, and Python to build high performance data processing systems and ensure data quality, reliability, and lineage across Mastercardβs data ecosystem.
The ideal candidate combines strong technical expertise with hands-on experience in distributed data systems, workflow automation, and performance tuning to deliver impactful, data-driven solutions at enterprise scale.
Responsibilities:
β’ Design and optimize Spark-based ETL pipelines for large-scale data processing.
β’ Build and manage Airflow DAGs for scheduling, orchestration, and checkpointing.
β’ Implement partitioning and shuffling strategies to improve Spark performance.
β’ Ensure data lineage, quality, and traceability across systems.
β’ Develop Python scripts for data transformation, aggregation, and validation.
β’ Execute and tune Spark jobs using spark-submit.
β’ Perform DataFrame joins and aggregations for analytical insights.
β’ Automate multi-step processes through shell scripting and variable management.
β’ Collaborate with data, DevOps, and analytics teams to deliver scalable data solutions.
Qualifications:
β’ Bachelorβs degree in Computer Science, Data Engineering, or related field (or equivalent experience).
β’ At least 7 years of experience in data engineering or big data development.
β’ Strong expertise in Apache Spark architecture, optimization, and job configuration.
β’ Proven experience with Airflow DAGs using authoring, scheduling, checkpointing, monitoring.
β’ Skilled in data shuffling, partitioning strategies, and performance tuning in distributed systems.
β’ Expertise in Python programming including data structures and algorithmic problem-solving.
β’ Hands-on with Spark DataFrames and PySpark transformations using joins, aggregations, filters.
β’ Proficient in shell scripting, including managing and passing variables between scripts.
β’ Experienced with spark submit for deployment and tuning.
β’ Solid understanding of ETL design, workflow automation, and distributed data systems.
β’ Excellent debugging and problem-solving skills in large-scale environments.
β’ Experience with AWS Glue, EMR, Databricks, or similar Spark platforms.
β’ Knowledge of data lineage and data quality frameworks like Apache Atlas.
β’ Familiarity with CI/CD pipelines, Docker/Kubernetes, and data governance tools.





