

Pashtek • Salesforce and SAP Partner
Data Engineer
⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Data Engineer on a contract basis, remote in the U.S. for 6 months at a pay rate of "X". Requires 5+ years of data engineering experience, proficiency in AWS services, Apache Spark, and data governance.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
Unknown
-
🗓️ - Date
October 11, 2025
🕒 - Duration
Unknown
-
🏝️ - Location
Remote
-
📄 - Contract
W2 Contractor
-
🔒 - Security
Unknown
-
📍 - Location detailed
United States
-
🧠 - Skills detailed
#Informatica #Data Modeling #Snowflake #Snowpark #BI (Business Intelligence) #Observability #Oracle #Tableau #Terraform #AWS (Amazon Web Services) #Azure DevOps #PCI (Payment Card Industry) #Apache Spark #Cloud #DMS (Data Migration Service) #Metadata #S3 (Amazon Simple Storage Service) #IAM (Identity and Access Management) #Data Engineering #SAP BW #DevOps #Fivetran #GitHub #Azure #Athena #Redshift #"ACID (Atomicity #Consistency #Isolation #Durability)" #Kubernetes #Dremio #Teradata #SAP #Infrastructure as Code (IaC) #GitLab #Security #Databricks #Delta Lake #Vault #Collibra #Clustering #Spark (Apache Spark) #EDW (Enterprise Data Warehouse) #Compliance #Microsoft Power BI #Batch #Classification #GDPR (General Data Protection Regulation) #Data Pipeline #Hadoop #Migration #Kafka (Apache Kafka) #Replication #Apache Iceberg #dbt (data build tool) #SQL (Structured Query Language) #SSIS (SQL Server Integration Services) #AWS Glue #"ETL (Extract #Transform #Load)" #REST (Representational State Transfer) #Data Catalog #Data Vault #Looker #Trino #Airflow #SQL Server
Role description
Location: Remote (United States)
Employment Type: contract
About the Role
As a Data Engineer, you’ll build and operate the pipelines, tables, and services that power our hybrid data platform across on-premises and AWS. You’ll implement lakehouse patterns, productionize batch and streaming workloads, and partner with security, platform, analytics, and application teams to deliver governed, high-performance data products at scale.
What You’ll Do
• Build data pipelines: Develop reliable ELT/ETL jobs in Spark/SQL/dbt/Airflow/Glue to ingest from on-prem and cloud sources into S3-backed lakes and warehouses.
• Implement lakehouse tables: Create and maintain Iceberg (or Delta/Hudi) tables using the appropriate catalog (AWS Glue, Hive Metastore, Polaris/REST) with ACID, time travel, and schema evolution.
• Operate compute engines: Run and tune Spark (Databricks/EMR), Trino/Starburst, Dremio, and Snowflake workloads; leverage pushdown and query acceleration where applicable.
• Model data for analytics: Deliver dimensional models, semantic layers, and domain-oriented data products; document contracts and SLAs with consumers.
• Governance & security: Apply data cataloging, lineage, PII classification, Lake Formation permissions, IAM roles, and row/column-level security; contribute to audit readiness.
• Performance & cost tuning: Optimize partitioning, clustering/Z-order, predicate pushdown, file sizing/compaction, caching, and workload isolation; monitor and right-size clusters.
• Migrations: Execute migration workstreams from legacy/on-prem EDW (Informatica/SSIS/SAP BW, SQL Server/Oracle, Hadoop) to lakehouse patterns (Spark/dbt/ELT), including dual-run cutovers and reconciliation.
• Streaming & CDC: Build real-time and near-real-time pipelines using Kafka/MSK/Kinesis and Spark Structured Streaming; implement CDC with Debezium, DMS, or Fivetran.
• Quality & observability: Add unit/integration tests, expectations/rules, data contracts, lineage, alerting, and SLO dashboards; participate in on-call rotations.
• Platform guardrails: Contribute to standards for naming, zones, schemas, S3 layout, encryption, backup/DR, and multi-region replication; write clear runbooks and docs.
• DevOps for data: Use Terraform/CloudFormation and CI/CD (GitHub Actions/GitLab/Azure DevOps) to version, test, and deploy data assets.
Required Experience
• 5+ years in data engineering (or equivalent), delivering production pipelines and tables on at least two large-scale platforms.
• Hands-on with AWS data services: S3, Glue/EMR, Lake Formation, IAM, and at least one warehouse (Snowflake or Redshift).
• Deep experience with Apache Spark and at least one of: Databricks, EMR Spark, Snowflake Snowpark, Dremio, or Starburst/Trino.
• Production experience with open table formats: Apache Iceberg (preferred), Delta Lake, or Apache Hudi; strong grasp of metadata/manifest, compaction, and schema evolution.
• Comfortable with on-prem stacks: Hadoop/Hive, Spark on Kubernetes, SQL Server/SSIS and/or Oracle/Exadata; Netezza/Teradata a plus.
• Proven data modeling (3NF, dimensional, Data Vault), ELT/ETL design, and SQL performance tuning.
• Security/governance: RBAC/ABAC, row/column-level security, masking/tokenization, KMS/key management.
• IaC & CI/CD for data workloads.
• Excellent communicator who can collaborate with platform engineers, analysts, and stakeholders to meet SLAs and roadmap goals.
Nice to Have
• dbt for ELT, Airflow orchestration, Great Expectations/Deequ for quality, OpenLineage/Marquez for lineage.
• Streaming experience with Kafka/MSK, Kinesis, or Flink.
• Catalogs/semantic: AWS Glue Data Catalog, Unity Catalog, Amundsen/DataHub, Atlan/Collibra.
• BI/serving: DuckDB, Athena, QuickSight, Tableau/Power BI/Looker.
• Compliance: SOC 2, HIPAA/PHI, GDPR, PCI; SSO/OIDC with Okta.
• Multi-tenant platforms or federated governance exposure.
Location & Work Style
Remote with core hours in PST/CST;
Location: Remote (United States)
Employment Type: contract
About the Role
As a Data Engineer, you’ll build and operate the pipelines, tables, and services that power our hybrid data platform across on-premises and AWS. You’ll implement lakehouse patterns, productionize batch and streaming workloads, and partner with security, platform, analytics, and application teams to deliver governed, high-performance data products at scale.
What You’ll Do
• Build data pipelines: Develop reliable ELT/ETL jobs in Spark/SQL/dbt/Airflow/Glue to ingest from on-prem and cloud sources into S3-backed lakes and warehouses.
• Implement lakehouse tables: Create and maintain Iceberg (or Delta/Hudi) tables using the appropriate catalog (AWS Glue, Hive Metastore, Polaris/REST) with ACID, time travel, and schema evolution.
• Operate compute engines: Run and tune Spark (Databricks/EMR), Trino/Starburst, Dremio, and Snowflake workloads; leverage pushdown and query acceleration where applicable.
• Model data for analytics: Deliver dimensional models, semantic layers, and domain-oriented data products; document contracts and SLAs with consumers.
• Governance & security: Apply data cataloging, lineage, PII classification, Lake Formation permissions, IAM roles, and row/column-level security; contribute to audit readiness.
• Performance & cost tuning: Optimize partitioning, clustering/Z-order, predicate pushdown, file sizing/compaction, caching, and workload isolation; monitor and right-size clusters.
• Migrations: Execute migration workstreams from legacy/on-prem EDW (Informatica/SSIS/SAP BW, SQL Server/Oracle, Hadoop) to lakehouse patterns (Spark/dbt/ELT), including dual-run cutovers and reconciliation.
• Streaming & CDC: Build real-time and near-real-time pipelines using Kafka/MSK/Kinesis and Spark Structured Streaming; implement CDC with Debezium, DMS, or Fivetran.
• Quality & observability: Add unit/integration tests, expectations/rules, data contracts, lineage, alerting, and SLO dashboards; participate in on-call rotations.
• Platform guardrails: Contribute to standards for naming, zones, schemas, S3 layout, encryption, backup/DR, and multi-region replication; write clear runbooks and docs.
• DevOps for data: Use Terraform/CloudFormation and CI/CD (GitHub Actions/GitLab/Azure DevOps) to version, test, and deploy data assets.
Required Experience
• 5+ years in data engineering (or equivalent), delivering production pipelines and tables on at least two large-scale platforms.
• Hands-on with AWS data services: S3, Glue/EMR, Lake Formation, IAM, and at least one warehouse (Snowflake or Redshift).
• Deep experience with Apache Spark and at least one of: Databricks, EMR Spark, Snowflake Snowpark, Dremio, or Starburst/Trino.
• Production experience with open table formats: Apache Iceberg (preferred), Delta Lake, or Apache Hudi; strong grasp of metadata/manifest, compaction, and schema evolution.
• Comfortable with on-prem stacks: Hadoop/Hive, Spark on Kubernetes, SQL Server/SSIS and/or Oracle/Exadata; Netezza/Teradata a plus.
• Proven data modeling (3NF, dimensional, Data Vault), ELT/ETL design, and SQL performance tuning.
• Security/governance: RBAC/ABAC, row/column-level security, masking/tokenization, KMS/key management.
• IaC & CI/CD for data workloads.
• Excellent communicator who can collaborate with platform engineers, analysts, and stakeholders to meet SLAs and roadmap goals.
Nice to Have
• dbt for ELT, Airflow orchestration, Great Expectations/Deequ for quality, OpenLineage/Marquez for lineage.
• Streaming experience with Kafka/MSK, Kinesis, or Flink.
• Catalogs/semantic: AWS Glue Data Catalog, Unity Catalog, Amundsen/DataHub, Atlan/Collibra.
• BI/serving: DuckDB, Athena, QuickSight, Tableau/Power BI/Looker.
• Compliance: SOC 2, HIPAA/PHI, GDPR, PCI; SSO/OIDC with Okta.
• Multi-tenant platforms or federated governance exposure.
Location & Work Style
Remote with core hours in PST/CST;