

Azure Data Lakehouse Engineer with Iceberg
β - Featured Role | Apply direct with Data Freelance Hub
This role is for an Azure Data Lakehouse Engineer with Iceberg, offering a hybrid contract in Raritan, NJ. Key skills include Apache Iceberg, Spark, and cloud integration. Experience with data governance and automation tools is required. Pay rate is unspecified.
π - Country
United States
π± - Currency
$ USD
-
π° - Day rate
-
ποΈ - Date discovered
August 2, 2025
π - Project duration
Unknown
-
ποΈ - Location type
Hybrid
-
π - Contract type
Unknown
-
π - Security clearance
Unknown
-
π - Location detailed
Raritan, NJ
-
π§ - Skills detailed
#BI (Business Intelligence) #Prometheus #Data Lakehouse #Observability #Data Catalog #Infrastructure as Code (IaC) #Tableau #Azure #S3 (Amazon Simple Storage Service) #Grafana #Data Lake #Alation #Java #REST (Representational State Transfer) #Storage #Presto #GCP (Google Cloud Platform) #Collibra #Terraform #Automation #Airflow #Python #Scala #Data Governance #"ACID (Atomicity #Consistency #Isolation #Durability)" #Cloud #Data Lineage #Metadata #AWS (Amazon Web Services) #Microsoft Power BI #dbt (data build tool) #ML (Machine Learning) #Trino #Deployment #Spark (Apache Spark) #Apache Airflow #Apache Iceberg #AWS Glue #Snowflake #Batch
Role description
Heading 1
Heading 2
Heading 3
Heading 4
Heading 5
Heading 6
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
Block quote
Ordered list
- Item 1
- Item 2
- Item 3
Unordered list
- Item A
- Item B
- Item C
Bold text
Emphasis
Superscript
Subscript
Title: Azure Data Lakehouse Engineer with Iceberg
Location: Raritan, NJ (Hybrid)
Contract
Job Description:
Apache Iceberg Architecture & Engineering
β’ Design and implement Iceberg table schemas and partitioning strategies optimized for performance, scalability, and query flexibility in petabyte-scale Lakehouse environments
β’ Architect Iceberg catalog layers: Hive, AWS Glue, Nessie, or REST-based catalogs enabling ACID-compliant operations and consistent metadata propagation across multiple engines
β’ Operationalize transactional inserts, upserts, deletes, and merges using Spark, Flink, or native Iceberg APIs supporting both streaming and batch workloads
β’ Manage schema evolution, partition evolution, and snapshot lifecycle policies to optimize long-term performance and storage
β’ Build custom tools or wrappers around Iceberg APIs in Java, Scala, or Python for metadata introspection, compaction automation, and lineage integration
Multi-Engine Compatibility and Optimization
β’ Enable multi-engine interoperability across Spark, Trino, Dromio, Flink, Snowflake, and Presto ensuring Iceberg tables are accessible, performant, and consistent regardless of compute layer
β’ Collaborate with BI and ML teams to design read-optimized layouts for consumption by Power BI, Tableau, or notebooks while enforcing data freshness SLAs
β’ Design table format version upgrade paths and coordinate with platform teams to validate compatibility across engines, connectors, and query runtimes
Governance, Metadata & Observability
β’ Integrate Iceberg tables with data catalogs (e.g., Unity Catalog, Alation, Amundsen) and data lineage platforms (e.g., Marquez, Collibra, DataHub)
β’ Enforce data governance via column-level lineage, row-level ACLs, and policy-based retention rules leveraging metadata-rich Iceberg features
β’ Automate compaction, snapshot expiration, and manifest pruning using orchestration tools (e.g., Apache Airflow, dbt, Dagster) for metadata hygiene
β’ Implement Iceberg table telemetry using Prometheus-Grafana or integrate with enterprise observability tools for query stats, scan efficiency, and access patterns
Automation, Scaling & Cloud Integration
β’ Build IaC-driven infrastructure using Terraform, CloudFormation, or Pulumi to deploy Iceberg-backed Lakehouse components across AWS, Azure, or GCP
β’ Automate Iceberg table provisioning pipelines as part of domain-specific data product deployments aligned to Data Mesh architecture
β’ Optimize compute-storage configurations in S3, Azure Data Lake Gen2, or GCS to support efficient Iceberg table layouts, object sizing, I/O patterns, and partition pruning
β’ Enable streaming ingestion: Spark Structured Streaming to land real-time CDC or log-based feeds