

Data Engineer
β - Featured Role | Apply direct with Data Freelance Hub
This role is for a Data Engineer with 9-12 years of experience, focusing on Databricks, Python, PySpark, and Airflow. The contract lasts 5 months, pays on a W2 basis, and is hybrid in San Jose, CA.
π - Country
United States
π± - Currency
$ USD
-
π° - Day rate
792
-
ποΈ - Date discovered
August 22, 2025
π - Project duration
3 to 6 months
-
ποΈ - Location type
Hybrid
-
π - Contract type
W2 Contractor
-
π - Security clearance
Unknown
-
π - Location detailed
San Jose, CA
-
π§ - Skills detailed
#Spark (Apache Spark) #Data Science #AWS (Amazon Web Services) #Data Pipeline #S3 (Amazon Simple Storage Service) #SQL (Structured Query Language) #Security #Data Engineering #Apache Spark #Scala #"ETL (Extract #Transform #Load)" #GIT #Data Processing #Data Quality #Version Control #Data Modeling #SQS (Simple Queue Service) #Automation #PySpark #Apache Airflow #Lambda (AWS Lambda) #Compliance #Scripting #Databricks #Airflow #Cloud #Python #Data Manipulation
Role description
Primary Skills: Databricks, Python, Pyspark, Airflow, Apache Spark
Location: San Jose, CA (This is a Hybrid role. 3 days a week in San Jose office.)
Duration: 5 months
Contract Type: W2 only
Responsibilities
β’ Design, develop, and maintain scalable and reliable data pipelines to support large-scale data processing.
β’ Build and optimize data workflows using orchestration tools like Apache Airflow and Spark to support scheduled and event-driven ETL/ELT processes.
β’ Implement complex parsing, cleansing, and transformation logic to normalize data from a variety of structured and unstructured sources.
β’ Collaborate with data scientists, analysts, and application teams to integrate, test, and validate data products and pipelines.
β’ Operate and maintain pipelines running on cloud platforms (AWS) and distributed compute environments (e.g., Databricks).
β’ Monitor pipeline performance, perform root cause analysis, and troubleshoot failures to ensure high data quality and uptime.
β’ Ensure proper security, compliance, and governance of data across systems and environments.
β’ Contribute to the automation and standardization of data engineering processes to improve development velocity and operational efficiency.
Required Skills
β’ 9-12 YOE
β’ Proficient in Python and PySpark for data processing and scripting.
β’ Strong experience with SQL for data manipulation and performance tuning.
β’ Deep understanding of distributed data processing with Apache Spark.
β’ Hands-on experience with Airflow or similar orchestration tools.
β’ Experience with cloud services and data tools in AWS (e.g., S3, Lambda, SQS, Gateway, Networking).
β’ Expertise with Databricks for collaborative data engineering and analytics.
β’ Solid understanding of data modeling, data warehousing, and best practices in data pipeline architecture.
β’ Strong problem-solving skills with the ability to work independently on complex tasks.
β’ Familiarity with CI/CD practices and version control (e.g., Git) in data engineering workflows.
Primary Skills: Databricks, Python, Pyspark, Airflow, Apache Spark
Location: San Jose, CA (This is a Hybrid role. 3 days a week in San Jose office.)
Duration: 5 months
Contract Type: W2 only
Responsibilities
β’ Design, develop, and maintain scalable and reliable data pipelines to support large-scale data processing.
β’ Build and optimize data workflows using orchestration tools like Apache Airflow and Spark to support scheduled and event-driven ETL/ELT processes.
β’ Implement complex parsing, cleansing, and transformation logic to normalize data from a variety of structured and unstructured sources.
β’ Collaborate with data scientists, analysts, and application teams to integrate, test, and validate data products and pipelines.
β’ Operate and maintain pipelines running on cloud platforms (AWS) and distributed compute environments (e.g., Databricks).
β’ Monitor pipeline performance, perform root cause analysis, and troubleshoot failures to ensure high data quality and uptime.
β’ Ensure proper security, compliance, and governance of data across systems and environments.
β’ Contribute to the automation and standardization of data engineering processes to improve development velocity and operational efficiency.
Required Skills
β’ 9-12 YOE
β’ Proficient in Python and PySpark for data processing and scripting.
β’ Strong experience with SQL for data manipulation and performance tuning.
β’ Deep understanding of distributed data processing with Apache Spark.
β’ Hands-on experience with Airflow or similar orchestration tools.
β’ Experience with cloud services and data tools in AWS (e.g., S3, Lambda, SQS, Gateway, Networking).
β’ Expertise with Databricks for collaborative data engineering and analytics.
β’ Solid understanding of data modeling, data warehousing, and best practices in data pipeline architecture.
β’ Strong problem-solving skills with the ability to work independently on complex tasks.
β’ Familiarity with CI/CD practices and version control (e.g., Git) in data engineering workflows.