

HMG AMERICA LLC
Big Data Consultant
⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Big Data Consultant with 5+ years of experience in Apache Spark and cloud platforms (Azure/AWS). Responsibilities include migrating data pipelines, refactoring code, and performance optimization. Key skills include HDFS, Hive, and data ingestion tools. Contract length and pay rate are unspecified.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
480
-
🗓️ - Date
March 26, 2026
🕒 - Duration
Unknown
-
🏝️ - Location
Unknown
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
San Francisco Bay Area
-
🧠 - Skills detailed
#HDFS (Hadoop Distributed File System) #SQL (Structured Query Language) #S3 (Amazon Simple Storage Service) #Hadoop #ADLS (Azure Data Lake Storage) #Azure #Data Ingestion #Data Migration #Scala #Sqoop (Apache Sqoop) #Data Pipeline #NiFi (Apache NiFi) #Cloud #Apache Spark #HiveQL #Datasets #Storage #Migration #PySpark #Big Data #Synapse #Regression #Kafka (Apache Kafka) #AWS EMR (Amazon Elastic MapReduce) #Databricks #Python #Scripting #Spark SQL #AWS (Amazon Web Services) #HBase #Spark (Apache Spark)
Role description
About the Company
A Spark job migration specialist migrates data pipelines, JAR tasks, and analytics workloads from legacy systems (like Hadoop/CDH or AWS EMR) to ACOS modern platforms. This involves refactoring code (e.g., Hive to PySpark), performance testing, and updating Spark 2.x to 3.x.
About the Role
A Spark job migration specialist migrates data pipelines, JAR tasks, and analytics workloads from legacy systems (like Hadoop/CDH or AWS EMR) to ACOS modern platforms. This involves refactoring code (e.g., Hive to PySpark), performance testing, and updating Spark 2.x to 3.x.
Responsibilities
• Workload Migration: Migrate JVM workloads and Spark-Submit tasks to Databricks JAR tasks or Notebook tasks.
• Pipeline Re-engineering: Convert existing HiveQL scripts and Oozie workflows into optimized Spark SQL or PySpark applications.
• Refactoring: Adapt data pipelines from Azure Synapse to any cloud platform, including updating library dependencies and notebook references.
• Performance Optimization: Implement Adaptive Query Execution (AQE) in Spark 3 to improve shuffle performance and fix skew joins.
• Testing & Validation: Perform regression testing to ensure output consistency between old and new systems using validation scripts.
• Job Customization: Use spark.sparkContext.setJobDescription() to label, monitor, and troubleshoot specific Spark tasks in the UI.
Qualifications
Role: Big Data Migration Engineer (Spark)
Experience: 5+ years experience with Apache Spark (PySpark/Scala) and Cloud platforms (Azure/AWS).
Required Skills
• Strong experience with HDFS, Hadoop ecosystem (Hive, Spark, HBase, MapReduce).
• Experience in data migration to cloud / enterprise data platforms.
• Knowledge of:
•
• Data ingestion tools (Sqoop, Kafka, NiFi, etc.)
• Cloud storage (ADLS, S3, Blob Storage)
• Distributed processing frameworks
• SQL and performance tuning expertise.
• Experience in scripting (Python, Shell, Scala).
Preferred Skills
• Data Pipelines: Ensuring schema evolution, data correctness, and testing with golden datasets.
• Job Definitions: Reconfiguring job properties, cluster settings, and Spark configurations.
Pay range and compensation package
Pay range or salary or compensation details not provided.
Equal Opportunity Statement
We are committed to diversity and inclusivity.
About the Company
A Spark job migration specialist migrates data pipelines, JAR tasks, and analytics workloads from legacy systems (like Hadoop/CDH or AWS EMR) to ACOS modern platforms. This involves refactoring code (e.g., Hive to PySpark), performance testing, and updating Spark 2.x to 3.x.
About the Role
A Spark job migration specialist migrates data pipelines, JAR tasks, and analytics workloads from legacy systems (like Hadoop/CDH or AWS EMR) to ACOS modern platforms. This involves refactoring code (e.g., Hive to PySpark), performance testing, and updating Spark 2.x to 3.x.
Responsibilities
• Workload Migration: Migrate JVM workloads and Spark-Submit tasks to Databricks JAR tasks or Notebook tasks.
• Pipeline Re-engineering: Convert existing HiveQL scripts and Oozie workflows into optimized Spark SQL or PySpark applications.
• Refactoring: Adapt data pipelines from Azure Synapse to any cloud platform, including updating library dependencies and notebook references.
• Performance Optimization: Implement Adaptive Query Execution (AQE) in Spark 3 to improve shuffle performance and fix skew joins.
• Testing & Validation: Perform regression testing to ensure output consistency between old and new systems using validation scripts.
• Job Customization: Use spark.sparkContext.setJobDescription() to label, monitor, and troubleshoot specific Spark tasks in the UI.
Qualifications
Role: Big Data Migration Engineer (Spark)
Experience: 5+ years experience with Apache Spark (PySpark/Scala) and Cloud platforms (Azure/AWS).
Required Skills
• Strong experience with HDFS, Hadoop ecosystem (Hive, Spark, HBase, MapReduce).
• Experience in data migration to cloud / enterprise data platforms.
• Knowledge of:
•
• Data ingestion tools (Sqoop, Kafka, NiFi, etc.)
• Cloud storage (ADLS, S3, Blob Storage)
• Distributed processing frameworks
• SQL and performance tuning expertise.
• Experience in scripting (Python, Shell, Scala).
Preferred Skills
• Data Pipelines: Ensuring schema evolution, data correctness, and testing with golden datasets.
• Job Definitions: Reconfiguring job properties, cluster settings, and Spark configurations.
Pay range and compensation package
Pay range or salary or compensation details not provided.
Equal Opportunity Statement
We are committed to diversity and inclusivity.






