Data Engineer

⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a "Data Engineer" focused on Databricks and AWS, offering $43-$53 per hour for a 3+ month contract in Columbus, OH. Key skills include Databricks, PySpark, AWS, and experience with data governance and CI/CD practices.
🌎 - Country
United States
πŸ’± - Currency
$ USD
-
πŸ’° - Day rate
424
-
πŸ—“οΈ - Date discovered
September 30, 2025
πŸ•’ - Project duration
More than 6 months
-
🏝️ - Location type
On-site
-
πŸ“„ - Contract type
W2 Contractor
-
πŸ”’ - Security clearance
Unknown
-
πŸ“ - Location detailed
Columbus, Ohio Metropolitan Area
-
🧠 - Skills detailed
#Datasets #Security #DevOps #Infrastructure as Code (IaC) #Terraform #Monitoring #Batch #Data Quality #Jenkins #Data Processing #AWS (Amazon Web Services) #Collibra #IAM (Identity and Access Management) #Data Architecture #Data Pipeline #Data Lineage #S3 (Amazon Simple Storage Service) #GitHub #Datadog #Azure DevOps #Observability #Pytest #Alation #Lambda (AWS Lambda) #Scala #Databricks #Data Engineering #Kafka (Apache Kafka) #PySpark #"ETL (Extract #Transform #Load)" #Logging #Spark (Apache Spark) #Documentation #Scrum #Azure #Data Catalog #AI (Artificial Intelligence) #ML (Machine Learning) #AWS Glue #Compliance #Delta Lake #Agile #GIT #Airflow #Cloud
Role description
Job Title: Databricks and AWS Focused Data Engineer (Contract) Company: Big Four Client Location (onsite): Columbus, OH Work Authorization: U.S. Citizen or Green Card Holder Pay Rate: $43-$53 per hour (W2) Duration: 3+ Months Overview: We are seeking an experienced data engineer to deliver high-quality, scalable data solutions on Databricks and AWS for one of our Big Four clients. You will build and optimize pipelines, implement medallion architecture, integrate streaming and batch sources, and enforce strong governance and access controls to support analytics and ML use cases. Key Responsibilities: β€’ Build and Maintain Data Pipelines: Develop scalable data pipelines using PySpark and Spark within the Databricks environment. β€’ Implement Medallion Architecture: Design workflows using raw, trusted, and refined layers to drive reliable data processing. β€’ Integrate Diverse Data Sources: Connect data from Kafka streams, extract channels, and APIs. β€’ Data Cataloging and Governance: Model and register datasets in enterprise data catalogs, ensuring robust governance and accessibility. β€’ Access Control: Manage secure, role-based access patterns to support analytics, AI, and ML needs. β€’ Team Collaboration: Work closely with peers to achieve required code coverage and deliver high-quality, well-tested solutions. β€’ Optimize and Operationalize: Tune Spark jobs (partitioning, caching, broadcast joins, AQE), manage Delta Lake performance (Z-Ordering, OPTIMIZE, VACUUM), and implement cost and reliability best practices on AWS. β€’ Data Quality and Testing: Implement data quality checks and validations (e.g., Great Expectations, custom PySpark checks), unit/integration tests, and CI/CD for Databricks Jobs/Workflows. β€’ Infrastructure as Code: Provision and manage Databricks and AWS resources using Terraform (workspaces, clusters, jobs, secret scopes, Unity Catalog objects, S3, IAM). β€’ Monitoring and Observability: Set up logging, metrics, and alerts (CloudWatch, Datadog, Databricks audit logs) for pipelines and jobs. β€’ Documentation: Produce clear technical documentation, runbooks, and data lineage for governed datasets. Required Skills & Qualifications: β€’ Databricks: 6-9 years of experience with expert-level proficiency β€’ PySpark/Spark: 6-9 years of advanced hands-on experience β€’ AWS: 6-9 years of experience with strong competency, including S3 and Terraform for infrastructure-as-code β€’ Data Architecture: Solid knowledge of the medallion pattern and data warehousing best practices β€’ Data Pipelines: Proven ability to build, optimize, and govern enterprise data pipelines β€’ Delta Lake and Unity Catalog: Expertise in Delta Lake internals, time travel, schema evolution/enforcement, and Unity Catalog RBAC/ABAC β€’ Streaming: Hands-on experience with Spark Structured Streaming, Kafka, checkpointing, exactly-once semantics, and late-arriving data handling β€’ CI/CD: Experience with Git-based workflows and CI/CD for Databricks (e.g., Databricks Repos, dbx, GitHub Actions, Azure DevOps, or Jenkins) β€’ Security and Compliance: Experience with IAM, KMS, encryption, secrets management, token/credential rotation, and PII governance β€’ Performance and Cost: Demonstrated ability to tune Spark jobs and optimize Databricks cluster configurations and AWS usage for cost and throughput β€’ Collaboration: Experience working in Agile/Scrum teams, peer reviews, and achieving code coverage targets Preferred Skills & Qualifications: β€’ Certifications: Databricks Data Engineer Professional, AWS Solutions Architect/Developer, HashiCorp Terraform Associate β€’ Data Catalogs: Experience with enterprise catalogs such as Collibra or Alation, and lineage tooling such as OpenLineage β€’ Orchestration: Databricks Workflows and/or Airflow β€’ Additional AWS: Glue, Lambda, Step Functions, CloudWatch, Secrets Manager β€’ Testing: pytest, chispa, Great Expectations, dbx test β€’ Domain Experience: Analytics and ML feature pipelines, MLOps integrations