

NLB Services
Senior Iceberg DBA
β - Featured Role | Apply direct with Data Freelance Hub
This role is for a Senior Iceberg DBA with 10+ years in Big Data/Data Engineering, 2+ years with Apache Iceberg, and 6+ years in the Cloudera ecosystem. Contract length is "unknown," pay rate is "unknown," and work location is "unknown." Key skills include Iceberg table optimization, multi-engine performance tuning, and data migration.
π - Country
United States
π± - Currency
$ USD
-
π° - Day rate
Unknown
-
ποΈ - Date
April 10, 2026
π - Duration
Unknown
-
ποΈ - Location
Unknown
-
π - Contract
Unknown
-
π - Security
Unknown
-
π - Location detailed
United States
-
π§ - Skills detailed
#Data Accuracy #Cloud #Azure #Trino #Data Ingestion #Spark (Apache Spark) #Data Integrity #Metadata #Data Lake #Scripting #Data Migration #Data Engineering #Data Modeling #Apache Iceberg #Security #Automation #DBA (Database Administrator) #Data Lifecycle #NiFi (Apache NiFi) #Data Governance #Data Management #Storage #Big Data #Spark SQL #Migration #Data Architecture #Normalization #Data Access #Teradata #Impala #Cloudera #Datasets #Clustering #SQL (Structured Query Language) #AWS (Amazon Web Services) #Python
Role description
β’ 10+ years of experience in Big Data / Data Engineering / DBA / Data Operations roles
β’ Minimum 2+ years of hands-on experience with Apache Iceberg in production environments
β’ 6+ years of experience working with Cloudera ecosystem (CDP Ecosystem)
β’ Strong expertise in:
β’ Iceberg table optimization (compaction, metadata management, partition evolution)
β’ Multi-engine performance tuning (Spark, Hive, Impala)
β’ Troubleshooting complex data and query performance issues
β’ Proven experience handling:
β’ P1/P2 production incidents
β’ Large-scale environments (TB/PB scale)
β’ Data migration initiatives (Hive/Teradata β Iceberg)
β’ Lead enforcement of data modeling and Lakehouse standards across applications
β’ Guide teams on:
β’ Medallion architecture implementation
β’ Balancing normalization vs performance
β’ Review and resolve complex data modeling and performance trade-offs
β’ Ensure consistency of data structures across domains and workloads
β’ Mentor and guide L2 resources in operational best practices and troubleshooting
Required Skills
β’ Strong hands-on experience with Apache Iceberg and/or Hive-based data lakes
β’ Understanding of data modeling concepts (normal forms) and modern Lakehouse patterns (Medallion architecture)
β’ Expertise in:
β’ Table-level optimization and performance tuning
β’ Large-scale data management (TB/PB scale)
β’ Experience with:
β’ Spark SQL, Hive, Impala, NiFI, Trino
β’ Strong understanding of:
β’ Partitioning strategies
β’ File formats (Parquet/ORC)
β’ Distributed query processing
Preferred Skills
β’ Experience with:
β’ Hive-to-Iceberg or Teradata-to-Iceberg migration
β’ Cloudera CDP (CDE/CDW)
β’ Familiarity with:
β’ Cloud platforms (AWS, Azure)
β’ Scripting/automation (Python, Shell)
What Youβll Work On
β’ Enterprise-scale Iceberg Lakehouse platform supporting multiple applications
β’ Large-scale data modernization initiatives
β’ Performance optimization and stability of mission-critical analytical workloads
Why This Role Matters
β’ Ensures data correctness and performance for downstream analytics and business-critical reporting
β’ Enables successful modernization from legacy platforms to Iceberg
β’ Maintains high availability and reliability of the enterprise data layer
Job Summary
We are seeking a highly skilled Iceberg DBA / Lakehouse Operations Engineer to own the reliability, performance, and operational integrity of the Iceberg data layer powering enterprise analytics and business-critical applications.
This role operates in a large-scale, multi-engine Lakehouse environment, supporting workloads across Spark, Hive, and Impala, and plays a key role in enterprise data modernization initiatives (Hive and Teradata β Iceberg).
The ideal candidate brings deep expertise in Iceberg table operations, metadata management, and query performance optimization, ensuring consistent, high-performance data access across platforms in a cloud-based environment.
This role is critical to ensuring data accuracy and performanceβany degradation directly impacts downstream reporting, analytics, and business-critical decision-making.
Key Responsibilities:
Iceberg Data Layer Ownership & Operations
β’ Own day-to-day operations of Apache Iceberg tables supporting multiple enterprise applications
β’ Ensure data reliability, consistency, and availability across all Lakehouse workloads
β’ Maintain operational integrity for datasets at multi-terabyte to petabyte scale
Advanced Table Management & Optimization
β’ Execute advanced Iceberg table maintenance and optimization strategies:
β’ Compaction (minor/major) and small file mitigation
β’ Snapshot expiration and metadata compaction to control metadata growth
β’ Orphan file cleanup (vacuum) to maintain storage efficiency
β’ Optimize data layout and performance through:
β’ File size tuning and distribution strategies
β’ Partition evolution and pruning optimization
β’ Clustering and ordering techniques (e.g., Z-ordering or similar patterns)
Data Modeling Standards & Lakehouse Design Alignment
β’ Support and enforce data modeling best practices aligned with:
β’ Normalized data structures (3NF) for source-aligned datasets
β’ Medallion architecture (Bronze / Silver / Gold layers) for curated data flows
β’ Ensure Iceberg table design aligns with:
β’ Data ingestion patterns (raw vs curated layers)
β’ Downstream consumption and performance requirements
β’ Assist in structuring datasets to balance:
β’ Data integrity and normalization
β’ Query performance and analytical efficiency
β’ Work with data engineering teams to ensure consistent implementation of layered data architecture across multiple applications
Multi-Engine Query Performance & Consistency
β’ Ensure consistent and performant query behavior across:
β’ Spark (CDE)
β’ Hive / Impala (CDW)
β’ Troubleshoot and resolve:
β’ Query performance bottlenecks
β’ Metadata inconsistencies across engines
β’ Inefficient execution plans and scan patterns
Hive & Teradata Modernization Support
β’ Play a key role in enterprise data platform modernization (Hive and Teradata β Iceberg)
β’ Support:
β’ Schema alignment and data type mapping
β’ Data validation and reconciliation
β’ Troubleshoot migration-related issues and ensure post-migration stability and performance
Metadata & Data Lifecycle Management
β’ Manage Iceberg metadata to ensure:
β’ Efficient scaling and performance
β’ Consistent table state across engines
β’ Execute lifecycle operations:
β’ Data retention and archival policies
β’ Snapshot lifecycle management and cleanup
β’ Time-travel optimization and maintenance
Production Support, Incident Resolution & On-Call
β’ Provide L2/L3 support for data-related production issues across Iceberg-based Lakehouse workloads
β’ Participate in on-call rotation to support critical data platforms and ensure timely response to incidents
β’ Respond to and resolve P1/P2 production incidents within defined SLAs, minimizing impact to downstream applications and reporting
β’ Troubleshoot:
β’ Data inconsistencies and reporting discrepancies
β’ Query failures and performance degradation
β’ Perform root cause analysis (RCA) and implement preventive measures to avoid recurring issues
β’ Collaborate with platform and application teams during incident triage and resolution
Security & Data Governance Support
β’ Support fine-grained access control using:
β’ Ranger policies and RBAC
β’ Own and ensure data validation, reconciliation, and accuracy between source and Iceberg datasets
β’ Ensure secure and compliant access to data across applications
β’ 10+ years of experience in Big Data / Data Engineering / DBA / Data Operations roles
β’ Minimum 2+ years of hands-on experience with Apache Iceberg in production environments
β’ 6+ years of experience working with Cloudera ecosystem (CDP Ecosystem)
β’ Strong expertise in:
β’ Iceberg table optimization (compaction, metadata management, partition evolution)
β’ Multi-engine performance tuning (Spark, Hive, Impala)
β’ Troubleshooting complex data and query performance issues
β’ Proven experience handling:
β’ P1/P2 production incidents
β’ Large-scale environments (TB/PB scale)
β’ Data migration initiatives (Hive/Teradata β Iceberg)
β’ Lead enforcement of data modeling and Lakehouse standards across applications
β’ Guide teams on:
β’ Medallion architecture implementation
β’ Balancing normalization vs performance
β’ Review and resolve complex data modeling and performance trade-offs
β’ Ensure consistency of data structures across domains and workloads
β’ Mentor and guide L2 resources in operational best practices and troubleshooting
Required Skills
β’ Strong hands-on experience with Apache Iceberg and/or Hive-based data lakes
β’ Understanding of data modeling concepts (normal forms) and modern Lakehouse patterns (Medallion architecture)
β’ Expertise in:
β’ Table-level optimization and performance tuning
β’ Large-scale data management (TB/PB scale)
β’ Experience with:
β’ Spark SQL, Hive, Impala, NiFI, Trino
β’ Strong understanding of:
β’ Partitioning strategies
β’ File formats (Parquet/ORC)
β’ Distributed query processing
Preferred Skills
β’ Experience with:
β’ Hive-to-Iceberg or Teradata-to-Iceberg migration
β’ Cloudera CDP (CDE/CDW)
β’ Familiarity with:
β’ Cloud platforms (AWS, Azure)
β’ Scripting/automation (Python, Shell)
What Youβll Work On
β’ Enterprise-scale Iceberg Lakehouse platform supporting multiple applications
β’ Large-scale data modernization initiatives
β’ Performance optimization and stability of mission-critical analytical workloads
Why This Role Matters
β’ Ensures data correctness and performance for downstream analytics and business-critical reporting
β’ Enables successful modernization from legacy platforms to Iceberg
β’ Maintains high availability and reliability of the enterprise data layer
Job Summary
We are seeking a highly skilled Iceberg DBA / Lakehouse Operations Engineer to own the reliability, performance, and operational integrity of the Iceberg data layer powering enterprise analytics and business-critical applications.
This role operates in a large-scale, multi-engine Lakehouse environment, supporting workloads across Spark, Hive, and Impala, and plays a key role in enterprise data modernization initiatives (Hive and Teradata β Iceberg).
The ideal candidate brings deep expertise in Iceberg table operations, metadata management, and query performance optimization, ensuring consistent, high-performance data access across platforms in a cloud-based environment.
This role is critical to ensuring data accuracy and performanceβany degradation directly impacts downstream reporting, analytics, and business-critical decision-making.
Key Responsibilities:
Iceberg Data Layer Ownership & Operations
β’ Own day-to-day operations of Apache Iceberg tables supporting multiple enterprise applications
β’ Ensure data reliability, consistency, and availability across all Lakehouse workloads
β’ Maintain operational integrity for datasets at multi-terabyte to petabyte scale
Advanced Table Management & Optimization
β’ Execute advanced Iceberg table maintenance and optimization strategies:
β’ Compaction (minor/major) and small file mitigation
β’ Snapshot expiration and metadata compaction to control metadata growth
β’ Orphan file cleanup (vacuum) to maintain storage efficiency
β’ Optimize data layout and performance through:
β’ File size tuning and distribution strategies
β’ Partition evolution and pruning optimization
β’ Clustering and ordering techniques (e.g., Z-ordering or similar patterns)
Data Modeling Standards & Lakehouse Design Alignment
β’ Support and enforce data modeling best practices aligned with:
β’ Normalized data structures (3NF) for source-aligned datasets
β’ Medallion architecture (Bronze / Silver / Gold layers) for curated data flows
β’ Ensure Iceberg table design aligns with:
β’ Data ingestion patterns (raw vs curated layers)
β’ Downstream consumption and performance requirements
β’ Assist in structuring datasets to balance:
β’ Data integrity and normalization
β’ Query performance and analytical efficiency
β’ Work with data engineering teams to ensure consistent implementation of layered data architecture across multiple applications
Multi-Engine Query Performance & Consistency
β’ Ensure consistent and performant query behavior across:
β’ Spark (CDE)
β’ Hive / Impala (CDW)
β’ Troubleshoot and resolve:
β’ Query performance bottlenecks
β’ Metadata inconsistencies across engines
β’ Inefficient execution plans and scan patterns
Hive & Teradata Modernization Support
β’ Play a key role in enterprise data platform modernization (Hive and Teradata β Iceberg)
β’ Support:
β’ Schema alignment and data type mapping
β’ Data validation and reconciliation
β’ Troubleshoot migration-related issues and ensure post-migration stability and performance
Metadata & Data Lifecycle Management
β’ Manage Iceberg metadata to ensure:
β’ Efficient scaling and performance
β’ Consistent table state across engines
β’ Execute lifecycle operations:
β’ Data retention and archival policies
β’ Snapshot lifecycle management and cleanup
β’ Time-travel optimization and maintenance
Production Support, Incident Resolution & On-Call
β’ Provide L2/L3 support for data-related production issues across Iceberg-based Lakehouse workloads
β’ Participate in on-call rotation to support critical data platforms and ensure timely response to incidents
β’ Respond to and resolve P1/P2 production incidents within defined SLAs, minimizing impact to downstream applications and reporting
β’ Troubleshoot:
β’ Data inconsistencies and reporting discrepancies
β’ Query failures and performance degradation
β’ Perform root cause analysis (RCA) and implement preventive measures to avoid recurring issues
β’ Collaborate with platform and application teams during incident triage and resolution
Security & Data Governance Support
β’ Support fine-grained access control using:
β’ Ranger policies and RBAC
β’ Own and ensure data validation, reconciliation, and accuracy between source and Iceberg datasets
β’ Ensure secure and compliant access to data across applications






