

IPolarity
Data Engineer (Spark | Hadoop | Apache Ozone)
β - Featured Role | Apply direct with Data Freelance Hub
This role is for a Data Engineer (Spark | Hadoop | Apache Ozone) with a contract length of "X months", offering a pay rate of "$X per hour". Key skills include Apache Spark, Hadoop, Apache Ozone, and programming in Python, Scala, or Java. A Bachelor's degree in a related field is required.
π - Country
United States
π± - Currency
$ USD
-
π° - Day rate
Unknown
-
ποΈ - Date
March 3, 2026
π - Duration
Unknown
-
ποΈ - Location
Unknown
-
π - Contract
Unknown
-
π - Security
Unknown
-
π - Location detailed
Berkeley Heights, NJ
-
π§ - Skills detailed
#"ETL (Extract #Transform #Load)" #Azure #Data Science #Data Processing #AWS (Amazon Web Services) #Apache Ozone #Data Security #Big Data #Storage #Scripting #Python #Compliance #Data Engineering #Scala #Linux #HBase #Data Pipeline #Batch #Java #Security #Data Quality #Cloud #GCP (Google Cloud Platform) #Programming #Computer Science #Spark (Apache Spark) #Automation #Kubernetes #HDFS (Hadoop Distributed File System) #Apache Spark #Hadoop #YARN (Yet Another Resource Negotiator) #Docker #Kafka (Apache Kafka) #Unix #Data Storage #Shell Scripting #SQL (Structured Query Language)
Role description
Responsibilities:
β’ Design and implement scalable distributed data processing solutions using Apache Spark and the Hadoop ecosystem.
β’ Build and maintain Spark applications for ETL, aggregation, and large-scale data transformation.
β’ Implement and manage enterprise data storage using Apache Ozone and HDFS.
β’ Develop batch and real-time ingestion pipelines using modern big data technologies.
β’ Optimize cluster performance, storage efficiency, and resource utilization.
β’ Ensure data quality, governance, security, and compliance across platforms.
β’ Troubleshoot performance issues across distributed environments.
β’ Collaborate with Data Scientists, Analysts, and Application teams to deliver reliable data solutions.
β’ Automate workflows and operational processes using scripting and orchestration tools.
Required Skills:
β Strong experience with Apache Spark (Core, SQL, Streaming).
β Hands-on expertise with Hadoop ecosystem (HDFS, YARN, MapReduce).
β Experience working with Apache Ozone object storage.
β Programming skills in Python, Scala, or Java.
β Experience building scalable ETL/Data Pipelines.
β Knowledge of distributed systems and cluster optimization.
β Strong Linux/Unix and shell scripting experience.
β Understanding of data security, governance, and compliance practices.
Preferred Skills:
β’ Hive, HBase, or Kafka experience.
β’ Cloud-based big data platforms (AWS, Azure, or GCP).
β’ Containerization exposure (Docker, Kubernetes).
β’ CI/CD and automation for data engineering workflows.
Qualifications:
β’ Bachelorβs degree in computer science, Software Engineering, or related field.
β’ Experience delivering enterprise data platform or product implementations preferred.
β’ Excellent communication and collaboration skills, and Strong problem-solving mindset and analytical thinking.
Responsibilities:
β’ Design and implement scalable distributed data processing solutions using Apache Spark and the Hadoop ecosystem.
β’ Build and maintain Spark applications for ETL, aggregation, and large-scale data transformation.
β’ Implement and manage enterprise data storage using Apache Ozone and HDFS.
β’ Develop batch and real-time ingestion pipelines using modern big data technologies.
β’ Optimize cluster performance, storage efficiency, and resource utilization.
β’ Ensure data quality, governance, security, and compliance across platforms.
β’ Troubleshoot performance issues across distributed environments.
β’ Collaborate with Data Scientists, Analysts, and Application teams to deliver reliable data solutions.
β’ Automate workflows and operational processes using scripting and orchestration tools.
Required Skills:
β Strong experience with Apache Spark (Core, SQL, Streaming).
β Hands-on expertise with Hadoop ecosystem (HDFS, YARN, MapReduce).
β Experience working with Apache Ozone object storage.
β Programming skills in Python, Scala, or Java.
β Experience building scalable ETL/Data Pipelines.
β Knowledge of distributed systems and cluster optimization.
β Strong Linux/Unix and shell scripting experience.
β Understanding of data security, governance, and compliance practices.
Preferred Skills:
β’ Hive, HBase, or Kafka experience.
β’ Cloud-based big data platforms (AWS, Azure, or GCP).
β’ Containerization exposure (Docker, Kubernetes).
β’ CI/CD and automation for data engineering workflows.
Qualifications:
β’ Bachelorβs degree in computer science, Software Engineering, or related field.
β’ Experience delivering enterprise data platform or product implementations preferred.
β’ Excellent communication and collaboration skills, and Strong problem-solving mindset and analytical thinking.






