

Jansoft Global
Cloud Data Engineer (Apache Iceberg)
β - Featured Role | Apply direct with Data Freelance Hub
This role is for a Cloud Data Engineer (Apache Iceberg) in Westlake, TX, offering a 12+ month contract at $60-70/hr. Key skills include Apache Iceberg, AWS data stack, Python, Spark, and Kafka. 4-10+ years of data engineering experience required.
π - Country
United States
π± - Currency
$ USD
-
π° - Day rate
560
-
ποΈ - Date
February 25, 2026
π - Duration
More than 6 months
-
ποΈ - Location
On-site
-
π - Contract
W2 Contractor
-
π - Security
Unknown
-
π - Location detailed
Westlake, TX
-
π§ - Skills detailed
#Datasets #Metadata #Data Governance #Data Processing #Kafka (Apache Kafka) #Anomaly Detection #Batch #S3 (Amazon Simple Storage Service) #SQL (Structured Query Language) #AWS (Amazon Web Services) #Data Quality #Cloud #Data Lake #Data Pipeline #Security #Scala #Observability #Lambda (AWS Lambda) #PySpark #Data Modeling #Data Encryption #"ETL (Extract #Transform #Load)" #Automation #Monitoring #"ACID (Atomicity #Consistency #Isolation #Durability)" #Spark (Apache Spark) #Python #Data Engineering #IAM (Identity and Access Management) #GitHub #Apache Iceberg
Role description
Location: Westlake, TX (Local only)
Duration: 12+ Months Contract
Rate: $60-70/hr on W2
Job Description
Apache Iceberg is the key skill here. Must haves: Strong hands-on experience with Apache Iceberg (table design, evolution, metadata, partitioning). Deep experience with AWS data stack: S3, EMR, Lambda, Glue, IAM, Step Functions, CloudWatch Fluency in Python for data pipelines, automation, and APIs. Experience with distributed engines such as Spark, Flink, or PySpark. Expertise in scalable ETL/ELT pipelines and real-time streaming architectures. Strong SQL and data modeling expertise. Kafka can be a nice to have skill
Description:
Role Summary
We are seeking a highly skilled Data Engineer to design, build, and optimize our modern data platform leveraging Apache Iceberg on AWS, with strong expertise in Spark, Kafka and Python. The ideal candidate has deep experience building scalable, high?quality data pipelines, distributed data processing systems, and table-formatβbased lakehouse architectures.
This role is ideal for engineers who love building robust data foundations, enabling fast and reliable analytics, and working with cutting?edge open data lake technologies.
Key Responsibilities
1. Lakehouse Architecture (Apache Iceberg)
β’ Design and build Iceberg-based data lakes with ACID-compliant, versioned datasets.
β’ Implement Iceberg table evolution (schema evolution, partition spec, snapshot management).
β’ Develop best practices for Iceberg governance, metadata compaction, and performance tuning.
1. Data Pipelines & Distributed Processing
β’ Build scalable batch and streaming pipelines using AWS services (S3, EMR, Glue, Lambda, Step Functions).
β’ Develop ingestion and transformation workflows using Python, Spark, or Flink.
β’ Implement CDC pipelines using Kafka Connect or equivalent tooling.
β’ Ensure robust CI/CD integration with GitHub Actions or similar.
1. Streaming Data Engineering (Kafka)
β’ Design and operate Kafka-based streaming pipelines (Kafka/MSK).
β’ Build producers/consumers using Python or JVM languages.
β’ Implement patterns such as topic partitioning, compaction, schema registry, and event versioning.
1. Data Modeling, Quality, and Observability
β’ Design data models for analytical and operational use cases using Iceberg tables.
β’ Implement automated data quality checks, validation rules, and anomaly detection.
β’ Build lineage, monitoring, alerting, and pipeline observability.
1. AWS Architecture & Operations
β’ Apply best practices for AWS security, cost optimization, and data governance.
β’ Manage IAM, KMS, S3 object lifecycle management, networking, and data encryption.
β’ Operationalize EMR/Glue jobs, containerized workloads, or serverless workloads.
1. Cross?Functional Collaboration
β’ Partner with analytics, platform, and product teams to deliver high-quality data products.
β’ Participate in design reviews, architecture discussions, and roadmap planning.
β’ Mentor junior engineers and contribute to engineering standards.
Required Qualifications
β’ 4β10+ years of experience in Data Engineering or similar roles.
β’ Strong hands-on experience with Apache Iceberg (table design, evolution, metadata, partitioning).
β’ Deep experience with AWS data stack:
β’ S3, EMR, Lambda, Glue, IAM, Step Functions, CloudWatch
β’ Strong proficiency in Kafka (producers/consumers, schema registry, partitioning strategies).
β’ Fluency in Python for data pipelines, automation, and APIs.
β’ Experience with distributed engines such as Spark, Flink, or PySpark.
β’ Expertise in scalable ETL/ELT pipelines and real-time streaming architectures.
β’ Strong SQL and data modeling expertise.
Location: Westlake, TX (Local only)
Duration: 12+ Months Contract
Rate: $60-70/hr on W2
Job Description
Apache Iceberg is the key skill here. Must haves: Strong hands-on experience with Apache Iceberg (table design, evolution, metadata, partitioning). Deep experience with AWS data stack: S3, EMR, Lambda, Glue, IAM, Step Functions, CloudWatch Fluency in Python for data pipelines, automation, and APIs. Experience with distributed engines such as Spark, Flink, or PySpark. Expertise in scalable ETL/ELT pipelines and real-time streaming architectures. Strong SQL and data modeling expertise. Kafka can be a nice to have skill
Description:
Role Summary
We are seeking a highly skilled Data Engineer to design, build, and optimize our modern data platform leveraging Apache Iceberg on AWS, with strong expertise in Spark, Kafka and Python. The ideal candidate has deep experience building scalable, high?quality data pipelines, distributed data processing systems, and table-formatβbased lakehouse architectures.
This role is ideal for engineers who love building robust data foundations, enabling fast and reliable analytics, and working with cutting?edge open data lake technologies.
Key Responsibilities
1. Lakehouse Architecture (Apache Iceberg)
β’ Design and build Iceberg-based data lakes with ACID-compliant, versioned datasets.
β’ Implement Iceberg table evolution (schema evolution, partition spec, snapshot management).
β’ Develop best practices for Iceberg governance, metadata compaction, and performance tuning.
1. Data Pipelines & Distributed Processing
β’ Build scalable batch and streaming pipelines using AWS services (S3, EMR, Glue, Lambda, Step Functions).
β’ Develop ingestion and transformation workflows using Python, Spark, or Flink.
β’ Implement CDC pipelines using Kafka Connect or equivalent tooling.
β’ Ensure robust CI/CD integration with GitHub Actions or similar.
1. Streaming Data Engineering (Kafka)
β’ Design and operate Kafka-based streaming pipelines (Kafka/MSK).
β’ Build producers/consumers using Python or JVM languages.
β’ Implement patterns such as topic partitioning, compaction, schema registry, and event versioning.
1. Data Modeling, Quality, and Observability
β’ Design data models for analytical and operational use cases using Iceberg tables.
β’ Implement automated data quality checks, validation rules, and anomaly detection.
β’ Build lineage, monitoring, alerting, and pipeline observability.
1. AWS Architecture & Operations
β’ Apply best practices for AWS security, cost optimization, and data governance.
β’ Manage IAM, KMS, S3 object lifecycle management, networking, and data encryption.
β’ Operationalize EMR/Glue jobs, containerized workloads, or serverless workloads.
1. Cross?Functional Collaboration
β’ Partner with analytics, platform, and product teams to deliver high-quality data products.
β’ Participate in design reviews, architecture discussions, and roadmap planning.
β’ Mentor junior engineers and contribute to engineering standards.
Required Qualifications
β’ 4β10+ years of experience in Data Engineering or similar roles.
β’ Strong hands-on experience with Apache Iceberg (table design, evolution, metadata, partitioning).
β’ Deep experience with AWS data stack:
β’ S3, EMR, Lambda, Glue, IAM, Step Functions, CloudWatch
β’ Strong proficiency in Kafka (producers/consumers, schema registry, partitioning strategies).
β’ Fluency in Python for data pipelines, automation, and APIs.
β’ Experience with distributed engines such as Spark, Flink, or PySpark.
β’ Expertise in scalable ETL/ELT pipelines and real-time streaming architectures.
β’ Strong SQL and data modeling expertise.





