

Zeus Solutions Inc
PySpark Developer & Big Data Engineer
⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a PySpark Developer & Big Data Engineer with a contract length of over 6 months, offering a hybrid work location in Houston, TX. Key skills include Python, PySpark, and experience with cloud platforms. A degree in Computer Science or related field and 3–7 years in data engineering are required.
🌎 - Country
United States
💱 - Currency
Unknown
-
💰 - Day rate
Unknown
-
🗓️ - Date
November 5, 2025
🕒 - Duration
More than 6 months
-
🏝️ - Location
Hybrid
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
Houston, TX 77027
-
🧠 - Skills detailed
#Data Lake #ML (Machine Learning) #Data Governance #Batch #Snowflake #Databricks #Scala #Big Data #HDFS (Hadoop Distributed File System) #Spark SQL #SQL (Structured Query Language) #Datasets #Kafka (Apache Kafka) #Data Architecture #GCP (Google Cloud Platform) #Airflow #"ETL (Extract #Transform #Load)" #PySpark #BigQuery #Python #Data Transformations #Redshift #Data Science #Apache Spark #Synapse #Code Reviews #Spark (Apache Spark) #Debugging #Data Pipeline #Logging #AWS EMR (Amazon Elastic MapReduce) #Luigi #Security #AWS (Amazon Web Services) #Docker #Compliance #Computer Science #Hadoop #Azure #Data Engineering #Data Warehouse #Kubernetes #Cloud #Data Framework #Data Processing
Role description
Job Title: PySpark Developer & Big Data Engineer
About the Role
We are seeking a skilled PySpark Developer & Big Data Engineer to join our data engineering team. The ideal candidate will be responsible for designing, developing, and optimizing large-scale data processing solutions using Apache Spark, Python, and related Big Data technologies. You’ll work closely with data architects, analysts, and business stakeholders to build robust data pipelines that support analytics, reporting, and machine learning workloads.
Key Responsibilities
Design, develop, and maintain ETL/ELT pipelines using PySpark and Spark SQL.
Integrate data from various structured and unstructured sources into data lakes or warehouses.
Optimize Spark jobs for performance and scalability across large datasets.
Collaborate with data scientists and analysts to prepare clean, reliable data for analytics and ML models.
Implement data validation, quality checks, CDC, SCD, error handling and logging mechanisms.
Work with cloud platforms (e.g., Azure, GCP, AWS) to deploy and manage data processing jobs using both real-time streaming and batch processing.
Implement serverless compute, virtual machines, job clusters, for big data processing and heavy loads
Evaluate in depth cost optimization options, review spark internals, using pros and cons of serverless compute and fixed cluster options
Participate in code reviews, testing, and performance tuning of data pipelines.
Document processes, workflows, and data transformations.
Required Skills & Qualifications
Bachelor’s or Master’s degree in Computer Science, Information Technology, Data Engineering, or related field.
3–7 years of experience in data engineering or Big Data development.
Strong proficiency in Python and PySpark.
Solid understanding of Spark architecture, RDDs, DataFrames, and Spark SQL.
Hands-on experience with distributed data systems (e.g., Hadoop, Hive, HDFS).
Experience working with data warehouses (e.g., Snowflake, Redshift, BigQuery) and data lakes.
Familiarity with workflow orchestration tools (e.g., Airflow, Oozie, Luigi).
Experience with cloud services such as AWS EMR, Databricks, or Azure Synapse.
Strong SQL skills and understanding of database concepts.
Excellent problem-solving, debugging, and communication skills.
Preferred Qualifications
Experience with CI/CD pipelines for data workflows.
Exposure to streaming data frameworks (e.g., Kafka, Spark Streaming).
Knowledge of containerization (Docker, Kubernetes).
Understanding of data governance, security, and compliance best practices.
FinOps and Performance Optimization
Job Types: Full-time, Contract, Temporary, Permanent
Work Location: Hybrid remote in Houston, TX 77027
Job Title: PySpark Developer & Big Data Engineer
About the Role
We are seeking a skilled PySpark Developer & Big Data Engineer to join our data engineering team. The ideal candidate will be responsible for designing, developing, and optimizing large-scale data processing solutions using Apache Spark, Python, and related Big Data technologies. You’ll work closely with data architects, analysts, and business stakeholders to build robust data pipelines that support analytics, reporting, and machine learning workloads.
Key Responsibilities
Design, develop, and maintain ETL/ELT pipelines using PySpark and Spark SQL.
Integrate data from various structured and unstructured sources into data lakes or warehouses.
Optimize Spark jobs for performance and scalability across large datasets.
Collaborate with data scientists and analysts to prepare clean, reliable data for analytics and ML models.
Implement data validation, quality checks, CDC, SCD, error handling and logging mechanisms.
Work with cloud platforms (e.g., Azure, GCP, AWS) to deploy and manage data processing jobs using both real-time streaming and batch processing.
Implement serverless compute, virtual machines, job clusters, for big data processing and heavy loads
Evaluate in depth cost optimization options, review spark internals, using pros and cons of serverless compute and fixed cluster options
Participate in code reviews, testing, and performance tuning of data pipelines.
Document processes, workflows, and data transformations.
Required Skills & Qualifications
Bachelor’s or Master’s degree in Computer Science, Information Technology, Data Engineering, or related field.
3–7 years of experience in data engineering or Big Data development.
Strong proficiency in Python and PySpark.
Solid understanding of Spark architecture, RDDs, DataFrames, and Spark SQL.
Hands-on experience with distributed data systems (e.g., Hadoop, Hive, HDFS).
Experience working with data warehouses (e.g., Snowflake, Redshift, BigQuery) and data lakes.
Familiarity with workflow orchestration tools (e.g., Airflow, Oozie, Luigi).
Experience with cloud services such as AWS EMR, Databricks, or Azure Synapse.
Strong SQL skills and understanding of database concepts.
Excellent problem-solving, debugging, and communication skills.
Preferred Qualifications
Experience with CI/CD pipelines for data workflows.
Exposure to streaming data frameworks (e.g., Kafka, Spark Streaming).
Knowledge of containerization (Docker, Kubernetes).
Understanding of data governance, security, and compliance best practices.
FinOps and Performance Optimization
Job Types: Full-time, Contract, Temporary, Permanent
Work Location: Hybrid remote in Houston, TX 77027






