Azure Data Lakehouse Engineer with Iceberg

⭐ - Featured Role | Apply direct with Data Freelance Hub

This role is for an Azure Data Lakehouse Engineer with Iceberg, offering a hybrid contract in Raritan, NJ. Key skills include Apache Iceberg, Spark, and cloud integration. Experience with data governance and automation tools is required. Pay rate is unspecified.

🌎 - Country

United States

💱 - Currency

$ USD

💰 - Day rate

🗓️ - Date discovered

August 2, 2025

🕒 - Project duration

Unknown

🏝️ - Location type

Hybrid

📄 - Contract type

Unknown

🔒 - Security clearance

Unknown

📍 - Location detailed

Raritan, NJ

🧠 - Skills detailed

#BI (Business Intelligence) #Prometheus #Data Lakehouse #Observability #Data Catalog #Infrastructure as Code (IaC) #Tableau #Azure #S3 (Amazon Simple Storage Service) #Grafana #Data Lake #Alation #Java #REST (Representational State Transfer) #Storage #Presto #GCP (Google Cloud Platform) #Collibra #Terraform #Automation #Airflow #Python #Scala #Data Governance #"ACID (Atomicity #Consistency #Isolation #Durability)" #Cloud #Data Lineage #Metadata #AWS (Amazon Web Services) #Microsoft Power BI #dbt (data build tool) #ML (Machine Learning) #Trino #Deployment #Spark (Apache Spark) #Apache Airflow #Apache Iceberg #AWS Glue #Snowflake #Batch

Role description

Heading 1

Heading 2

Heading 3

Heading 4

Heading 5

Heading 6

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Block quote

Ordered list

Item 1
Item 2
Item 3

Unordered list

Item A
Item B
Item C

Text link

Bold text

Emphasis

^Superscript

_Subscript

Title: Azure Data Lakehouse Engineer with Iceberg Location: Raritan, NJ (Hybrid) Contract Job Description: Apache Iceberg Architecture & Engineering • Design and implement Iceberg table schemas and partitioning strategies optimized for performance, scalability, and query flexibility in petabyte-scale Lakehouse environments • Architect Iceberg catalog layers: Hive, AWS Glue, Nessie, or REST-based catalogs enabling ACID-compliant operations and consistent metadata propagation across multiple engines • Operationalize transactional inserts, upserts, deletes, and merges using Spark, Flink, or native Iceberg APIs supporting both streaming and batch workloads • Manage schema evolution, partition evolution, and snapshot lifecycle policies to optimize long-term performance and storage • Build custom tools or wrappers around Iceberg APIs in Java, Scala, or Python for metadata introspection, compaction automation, and lineage integration Multi-Engine Compatibility and Optimization • Enable multi-engine interoperability across Spark, Trino, Dromio, Flink, Snowflake, and Presto ensuring Iceberg tables are accessible, performant, and consistent regardless of compute layer • Collaborate with BI and ML teams to design read-optimized layouts for consumption by Power BI, Tableau, or notebooks while enforcing data freshness SLAs • Design table format version upgrade paths and coordinate with platform teams to validate compatibility across engines, connectors, and query runtimes Governance, Metadata & Observability • Integrate Iceberg tables with data catalogs (e.g., Unity Catalog, Alation, Amundsen) and data lineage platforms (e.g., Marquez, Collibra, DataHub) • Enforce data governance via column-level lineage, row-level ACLs, and policy-based retention rules leveraging metadata-rich Iceberg features • Automate compaction, snapshot expiration, and manifest pruning using orchestration tools (e.g., Apache Airflow, dbt, Dagster) for metadata hygiene • Implement Iceberg table telemetry using Prometheus-Grafana or integrate with enterprise observability tools for query stats, scan efficiency, and access patterns Automation, Scaling & Cloud Integration • Build IaC-driven infrastructure using Terraform, CloudFormation, or Pulumi to deploy Iceberg-backed Lakehouse components across AWS, Azure, or GCP • Automate Iceberg table provisioning pipelines as part of domain-specific data product deployments aligned to Data Mesh architecture • Optimize compute-storage configurations in S3, Azure Data Lake Gen2, or GCS to support efficient Iceberg table layouts, object sizing, I/O patterns, and partition pruning • Enable streaming ingestion: Spark Structured Streaming to land real-time CDC or log-based feeds

Apply now Apply with DFH Sign up

← See all roles