NLB Services

Senior Iceberg DBA

⭐ - Featured Role | Apply direct with Data Freelance Hub

This role is for a Senior Iceberg DBA with 10+ years in Big Data/Data Engineering, 2+ years with Apache Iceberg, and 6+ years in the Cloudera ecosystem. Contract length is "unknown," pay rate is "unknown," and work location is "unknown." Key skills include Iceberg table optimization, multi-engine performance tuning, and data migration.

🌎 - Country

United States

💱 - Currency

$ USD

💰 - Day rate

Unknown

🗓️ - Date

April 10, 2026

🕒 - Duration

Unknown

🏝️ - Location

Unknown

📄 - Contract

Unknown

🔒 - Security

Unknown

📍 - Location detailed

United States

🧠 - Skills detailed

#Data Accuracy #Cloud #Azure #Trino #Data Ingestion #Spark (Apache Spark) #Data Integrity #Metadata #Data Lake #Scripting #Data Migration #Data Engineering #Data Modeling #Apache Iceberg #Security #Automation #DBA (Database Administrator) #Data Lifecycle #NiFi (Apache NiFi) #Data Governance #Data Management #Storage #Big Data #Spark SQL #Migration #Data Architecture #Normalization #Data Access #Teradata #Impala #Cloudera #Datasets #Clustering #SQL (Structured Query Language) #AWS (Amazon Web Services) #Python

Role description

• 10+ years of experience in Big Data / Data Engineering / DBA / Data Operations roles • Minimum 2+ years of hands-on experience with Apache Iceberg in production environments • 6+ years of experience working with Cloudera ecosystem (CDP Ecosystem) • Strong expertise in: • Iceberg table optimization (compaction, metadata management, partition evolution) • Multi-engine performance tuning (Spark, Hive, Impala) • Troubleshooting complex data and query performance issues • Proven experience handling: • P1/P2 production incidents • Large-scale environments (TB/PB scale) • Data migration initiatives (Hive/Teradata → Iceberg) • Lead enforcement of data modeling and Lakehouse standards across applications • Guide teams on: • Medallion architecture implementation • Balancing normalization vs performance • Review and resolve complex data modeling and performance trade-offs • Ensure consistency of data structures across domains and workloads • Mentor and guide L2 resources in operational best practices and troubleshooting Required Skills • Strong hands-on experience with Apache Iceberg and/or Hive-based data lakes • Understanding of data modeling concepts (normal forms) and modern Lakehouse patterns (Medallion architecture) • Expertise in: • Table-level optimization and performance tuning • Large-scale data management (TB/PB scale) • Experience with: • Spark SQL, Hive, Impala, NiFI, Trino • Strong understanding of: • Partitioning strategies • File formats (Parquet/ORC) • Distributed query processing Preferred Skills • Experience with: • Hive-to-Iceberg or Teradata-to-Iceberg migration • Cloudera CDP (CDE/CDW) • Familiarity with: • Cloud platforms (AWS, Azure) • Scripting/automation (Python, Shell) What You’ll Work On • Enterprise-scale Iceberg Lakehouse platform supporting multiple applications • Large-scale data modernization initiatives • Performance optimization and stability of mission-critical analytical workloads Why This Role Matters • Ensures data correctness and performance for downstream analytics and business-critical reporting • Enables successful modernization from legacy platforms to Iceberg • Maintains high availability and reliability of the enterprise data layer Job Summary We are seeking a highly skilled Iceberg DBA / Lakehouse Operations Engineer to own the reliability, performance, and operational integrity of the Iceberg data layer powering enterprise analytics and business-critical applications. This role operates in a large-scale, multi-engine Lakehouse environment, supporting workloads across Spark, Hive, and Impala, and plays a key role in enterprise data modernization initiatives (Hive and Teradata → Iceberg). The ideal candidate brings deep expertise in Iceberg table operations, metadata management, and query performance optimization, ensuring consistent, high-performance data access across platforms in a cloud-based environment. This role is critical to ensuring data accuracy and performance—any degradation directly impacts downstream reporting, analytics, and business-critical decision-making. Key Responsibilities: Iceberg Data Layer Ownership & Operations • Own day-to-day operations of Apache Iceberg tables supporting multiple enterprise applications • Ensure data reliability, consistency, and availability across all Lakehouse workloads • Maintain operational integrity for datasets at multi-terabyte to petabyte scale Advanced Table Management & Optimization • Execute advanced Iceberg table maintenance and optimization strategies: • Compaction (minor/major) and small file mitigation • Snapshot expiration and metadata compaction to control metadata growth • Orphan file cleanup (vacuum) to maintain storage efficiency • Optimize data layout and performance through: • File size tuning and distribution strategies • Partition evolution and pruning optimization • Clustering and ordering techniques (e.g., Z-ordering or similar patterns) Data Modeling Standards & Lakehouse Design Alignment • Support and enforce data modeling best practices aligned with: • Normalized data structures (3NF) for source-aligned datasets • Medallion architecture (Bronze / Silver / Gold layers) for curated data flows • Ensure Iceberg table design aligns with: • Data ingestion patterns (raw vs curated layers) • Downstream consumption and performance requirements • Assist in structuring datasets to balance: • Data integrity and normalization • Query performance and analytical efficiency • Work with data engineering teams to ensure consistent implementation of layered data architecture across multiple applications Multi-Engine Query Performance & Consistency • Ensure consistent and performant query behavior across: • Spark (CDE) • Hive / Impala (CDW) • Troubleshoot and resolve: • Query performance bottlenecks • Metadata inconsistencies across engines • Inefficient execution plans and scan patterns Hive & Teradata Modernization Support • Play a key role in enterprise data platform modernization (Hive and Teradata → Iceberg) • Support: • Schema alignment and data type mapping • Data validation and reconciliation • Troubleshoot migration-related issues and ensure post-migration stability and performance Metadata & Data Lifecycle Management • Manage Iceberg metadata to ensure: • Efficient scaling and performance • Consistent table state across engines • Execute lifecycle operations: • Data retention and archival policies • Snapshot lifecycle management and cleanup • Time-travel optimization and maintenance Production Support, Incident Resolution & On-Call • Provide L2/L3 support for data-related production issues across Iceberg-based Lakehouse workloads • Participate in on-call rotation to support critical data platforms and ensure timely response to incidents • Respond to and resolve P1/P2 production incidents within defined SLAs, minimizing impact to downstream applications and reporting • Troubleshoot: • Data inconsistencies and reporting discrepancies • Query failures and performance degradation • Perform root cause analysis (RCA) and implement preventive measures to avoid recurring issues • Collaborate with platform and application teams during incident triage and resolution Security & Data Governance Support • Support fine-grained access control using: • Ranger policies and RBAC • Own and ensure data validation, reconciliation, and accuracy between source and Iceberg datasets • Ensure secure and compliant access to data across applications

Apply now Apply with DFH

← See all roles