

Senior Data Lakehouse Engineer Apache Iceberg
β - Featured Role | Apply direct with Data Freelance Hub
This role is for a Senior Data Lakehouse Engineer specializing in Apache Iceberg, offering a contract length of "X months" at a pay rate of "$X/hour." Remote work is available. Key skills include Apache Iceberg, Spark, and cloud integration.
π - Country
United States
π± - Currency
$ USD
-
π° - Day rate
-
ποΈ - Date discovered
July 19, 2025
π - Project duration
Unknown
-
ποΈ - Location type
Unknown
-
π - Contract type
Unknown
-
π - Security clearance
Unknown
-
π - Location detailed
Raritan, NJ
-
π§ - Skills detailed
#Observability #Metadata #Azure #AWS Glue #Data Governance #Microsoft Power BI #Prometheus #Terraform #Automation #Python #Scala #Data Lake #Grafana #Java #Presto #Batch #Trino #S3 (Amazon Simple Storage Service) #dbt (data build tool) #Deployment #Data Catalog #Storage #Data Lineage #Snowflake #Apache Iceberg #GCP (Google Cloud Platform) #Apache Airflow #AWS (Amazon Web Services) #BI (Business Intelligence) #Data Lakehouse #Spark (Apache Spark) #Tableau #Collibra #ML (Machine Learning) #Airflow #Cloud #Alation
Role description
Heading 1
Heading 2
Heading 3
Heading 4
Heading 5
Heading 6
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
Block quote
Ordered list
- Item 1
- Item 2
- Item 3
Unordered list
- Item A
- Item B
- Item C
Bold text
Emphasis
Superscript
Subscript
Apache Iceberg Architecture Engineering
Design and implement Iceberg table schemas and partitioning strategies optimized for performance scalability and query flexibility in petabytescale Lakehouse environments
Architect Iceberg catalog layers Hive AWS Glue Nessie or RESTbased catalogs enabling ACIDcompliant operations and consistent metadata propagation across multiple engines
Operationalize transactional inserts upsets deletes and merges using Spark Flink or native Iceberg APIs supporting both streaming and batch workloads
Manage schema evolution partition evolution and snapshot lifecycle policies to optimize longterm performance and storage
Build custom tools or wrappers around Iceberg APIs in Java Scala or Python for metadata introspection compaction automation and lineage integration
MultiEngine Compatibility and Optimization
Enable multiengine interoperability across Spark Trino Dromio Flink Snowflake and Presto ensuring Iceberg tables are accessible performant and consistent regardless of compute layer
Collaborate with BI and ML teams to design readoptimized layouts for consumption by Power BI Tableau or notebooks while enforcing data freshness SLAs
Design table format version upgrade paths and coordinate with platform teams to validate compatibility across engines connectors and query runtimes
Governance Metadata Observability
Integrate Iceberg tables with data catalogs eg Unity Catalog Alation Amundsen and data lineage platforms eg Marquez Collibra DataHub
Enforce data governance via columnlevel lineage rowlevel ACLs and policybased retention rules leveraging metadatarich Iceberg features
Automate compaction snapshot expiration and manifest pruning using orchestration tools eg Apache Airflow dbt Dagster for metadata hygiene
Implement Iceberg table telemetry using PrometheusGrafana or integrate with enterprise observability tools for query stats scan efficiency and access patterns
Automation Scaling Cloud Integration
Build IaCdriven infrastructure using Terraform CloudFormation or Pulumi to deploy Icebergbacked Lakehouse components across AWS Azure or GCP
Automate Iceberg table provisioning pipelines as part of domainspecific data product deployments aligned to Data Mesh architecture
Optimize computestorage configurations in S3 Azure Data Lake Gen2 or GCS to support efficient Iceberg table layouts object sizing IO patterns partition pruning
Enable streaming ingestion
Spark Structured Streaming to land realtime CDC or logbased feeds"