Senior Data Lakehouse Engineer Apache Iceberg

⭐ - Featured Role | Apply direct with Data Freelance Hub

This role is for a Senior Data Lakehouse Engineer specializing in Apache Iceberg, offering a contract length of "X months" at a pay rate of "$X/hour." Remote work is available. Key skills include Apache Iceberg, Spark, and cloud integration.

🌎 - Country

United States

💱 - Currency

$ USD

💰 - Day rate

🗓️ - Date discovered

July 19, 2025

🕒 - Project duration

Unknown

🏝️ - Location type

Unknown

📄 - Contract type

Unknown

🔒 - Security clearance

Unknown

📍 - Location detailed

Raritan, NJ

🧠 - Skills detailed

#Observability #Metadata #Azure #AWS Glue #Data Governance #Microsoft Power BI #Prometheus #Terraform #Automation #Python #Scala #Data Lake #Grafana #Java #Presto #Batch #Trino #S3 (Amazon Simple Storage Service) #dbt (data build tool) #Deployment #Data Catalog #Storage #Data Lineage #Snowflake #Apache Iceberg #GCP (Google Cloud Platform) #Apache Airflow #AWS (Amazon Web Services) #BI (Business Intelligence) #Data Lakehouse #Spark (Apache Spark) #Tableau #Collibra #ML (Machine Learning) #Airflow #Cloud #Alation

Role description

Heading 1

Heading 2

Heading 3

Heading 4

Heading 5

Heading 6

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Block quote

Ordered list

Item 1
Item 2
Item 3

Unordered list

Item A
Item B
Item C

Text link

Bold text

Emphasis

^Superscript

_Subscript

Apache Iceberg Architecture Engineering Design and implement Iceberg table schemas and partitioning strategies optimized for performance scalability and query flexibility in petabytescale Lakehouse environments Architect Iceberg catalog layers Hive AWS Glue Nessie or RESTbased catalogs enabling ACIDcompliant operations and consistent metadata propagation across multiple engines Operationalize transactional inserts upsets deletes and merges using Spark Flink or native Iceberg APIs supporting both streaming and batch workloads Manage schema evolution partition evolution and snapshot lifecycle policies to optimize longterm performance and storage Build custom tools or wrappers around Iceberg APIs in Java Scala or Python for metadata introspection compaction automation and lineage integration MultiEngine Compatibility and Optimization Enable multiengine interoperability across Spark Trino Dromio Flink Snowflake and Presto ensuring Iceberg tables are accessible performant and consistent regardless of compute layer Collaborate with BI and ML teams to design readoptimized layouts for consumption by Power BI Tableau or notebooks while enforcing data freshness SLAs Design table format version upgrade paths and coordinate with platform teams to validate compatibility across engines connectors and query runtimes Governance Metadata Observability Integrate Iceberg tables with data catalogs eg Unity Catalog Alation Amundsen and data lineage platforms eg Marquez Collibra DataHub Enforce data governance via columnlevel lineage rowlevel ACLs and policybased retention rules leveraging metadatarich Iceberg features Automate compaction snapshot expiration and manifest pruning using orchestration tools eg Apache Airflow dbt Dagster for metadata hygiene Implement Iceberg table telemetry using PrometheusGrafana or integrate with enterprise observability tools for query stats scan efficiency and access patterns Automation Scaling Cloud Integration Build IaCdriven infrastructure using Terraform CloudFormation or Pulumi to deploy Icebergbacked Lakehouse components across AWS Azure or GCP Automate Iceberg table provisioning pipelines as part of domainspecific data product deployments aligned to Data Mesh architecture Optimize computestorage configurations in S3 Azure Data Lake Gen2 or GCS to support efficient Iceberg table layouts object sizing IO patterns partition pruning Enable streaming ingestion Spark Structured Streaming to land realtime CDC or logbased feeds"

Apply now Apply with DFH Sign up

← See all roles