HMG AMERICA LLC

Big Data Consultant

⭐ - Featured Role | Apply direct with Data Freelance Hub

This role is for a Big Data Consultant with 5+ years of experience in Apache Spark and cloud platforms (Azure/AWS). Responsibilities include migrating data pipelines, refactoring code, and performance optimization. Key skills include HDFS, Hive, and data ingestion tools. Contract length and pay rate are unspecified.

🌎 - Country

United States

💱 - Currency

$ USD

💰 - Day rate

480

🗓️ - Date

March 26, 2026

🕒 - Duration

Unknown

🏝️ - Location

Unknown

📄 - Contract

Unknown

🔒 - Security

Unknown

📍 - Location detailed

San Francisco Bay Area

🧠 - Skills detailed

#HDFS (Hadoop Distributed File System) #SQL (Structured Query Language) #S3 (Amazon Simple Storage Service) #Hadoop #ADLS (Azure Data Lake Storage) #Azure #Data Ingestion #Data Migration #Scala #Sqoop (Apache Sqoop) #Data Pipeline #NiFi (Apache NiFi) #Cloud #Apache Spark #HiveQL #Datasets #Storage #Migration #PySpark #Big Data #Synapse #Regression #Kafka (Apache Kafka) #AWS EMR (Amazon Elastic MapReduce) #Databricks #Python #Scripting #Spark SQL #AWS (Amazon Web Services) #HBase #Spark (Apache Spark)

Role description

About the Company A Spark job migration specialist migrates data pipelines, JAR tasks, and analytics workloads from legacy systems (like Hadoop/CDH or AWS EMR) to ACOS modern platforms. This involves refactoring code (e.g., Hive to PySpark), performance testing, and updating Spark 2.x to 3.x. About the Role A Spark job migration specialist migrates data pipelines, JAR tasks, and analytics workloads from legacy systems (like Hadoop/CDH or AWS EMR) to ACOS modern platforms. This involves refactoring code (e.g., Hive to PySpark), performance testing, and updating Spark 2.x to 3.x. Responsibilities • Workload Migration: Migrate JVM workloads and Spark-Submit tasks to Databricks JAR tasks or Notebook tasks. • Pipeline Re-engineering: Convert existing HiveQL scripts and Oozie workflows into optimized Spark SQL or PySpark applications. • Refactoring: Adapt data pipelines from Azure Synapse to any cloud platform, including updating library dependencies and notebook references. • Performance Optimization: Implement Adaptive Query Execution (AQE) in Spark 3 to improve shuffle performance and fix skew joins. • Testing & Validation: Perform regression testing to ensure output consistency between old and new systems using validation scripts. • Job Customization: Use spark.sparkContext.setJobDescription() to label, monitor, and troubleshoot specific Spark tasks in the UI. Qualifications Role: Big Data Migration Engineer (Spark) Experience: 5+ years experience with Apache Spark (PySpark/Scala) and Cloud platforms (Azure/AWS). Required Skills • Strong experience with HDFS, Hadoop ecosystem (Hive, Spark, HBase, MapReduce). • Experience in data migration to cloud / enterprise data platforms. • Knowledge of: • • Data ingestion tools (Sqoop, Kafka, NiFi, etc.) • Cloud storage (ADLS, S3, Blob Storage) • Distributed processing frameworks • SQL and performance tuning expertise. • Experience in scripting (Python, Shell, Scala). Preferred Skills • Data Pipelines: Ensuring schema evolution, data correctness, and testing with golden datasets. • Job Definitions: Reconfiguring job properties, cluster settings, and Spark configurations. Pay range and compensation package Pay range or salary or compensation details not provided. Equal Opportunity Statement We are committed to diversity and inclusivity.

Apply now Apply with DFH

HMG AMERICA LLC

Big Data Consultant

DATA ANALYST L4(CONTRACT)

Academic Program Management Officer 3 - Pediatrics

Database Administrator (Oracle and SQL Server)- Bilingual (Korean/ English)

Data Architect - CTH - Onsite

Book a

chat

with us

Company