Data Engineer

⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Data Engineer with 9-12 years of experience, focusing on Databricks, Python, PySpark, and Airflow. The contract lasts 5 months, pays on a W2 basis, and is hybrid in San Jose, CA.
🌎 - Country
United States
πŸ’± - Currency
$ USD
-
πŸ’° - Day rate
792
-
πŸ—“οΈ - Date discovered
August 22, 2025
πŸ•’ - Project duration
3 to 6 months
-
🏝️ - Location type
Hybrid
-
πŸ“„ - Contract type
W2 Contractor
-
πŸ”’ - Security clearance
Unknown
-
πŸ“ - Location detailed
San Jose, CA
-
🧠 - Skills detailed
#Spark (Apache Spark) #Data Science #AWS (Amazon Web Services) #Data Pipeline #S3 (Amazon Simple Storage Service) #SQL (Structured Query Language) #Security #Data Engineering #Apache Spark #Scala #"ETL (Extract #Transform #Load)" #GIT #Data Processing #Data Quality #Version Control #Data Modeling #SQS (Simple Queue Service) #Automation #PySpark #Apache Airflow #Lambda (AWS Lambda) #Compliance #Scripting #Databricks #Airflow #Cloud #Python #Data Manipulation
Role description
Primary Skills: Databricks, Python, Pyspark, Airflow, Apache Spark Location: San Jose, CA (This is a Hybrid role. 3 days a week in San Jose office.) Duration: 5 months Contract Type: W2 only Responsibilities β€’ Design, develop, and maintain scalable and reliable data pipelines to support large-scale data processing. β€’ Build and optimize data workflows using orchestration tools like Apache Airflow and Spark to support scheduled and event-driven ETL/ELT processes. β€’ Implement complex parsing, cleansing, and transformation logic to normalize data from a variety of structured and unstructured sources. β€’ Collaborate with data scientists, analysts, and application teams to integrate, test, and validate data products and pipelines. β€’ Operate and maintain pipelines running on cloud platforms (AWS) and distributed compute environments (e.g., Databricks). β€’ Monitor pipeline performance, perform root cause analysis, and troubleshoot failures to ensure high data quality and uptime. β€’ Ensure proper security, compliance, and governance of data across systems and environments. β€’ Contribute to the automation and standardization of data engineering processes to improve development velocity and operational efficiency. Required Skills β€’ 9-12 YOE β€’ Proficient in Python and PySpark for data processing and scripting. β€’ Strong experience with SQL for data manipulation and performance tuning. β€’ Deep understanding of distributed data processing with Apache Spark. β€’ Hands-on experience with Airflow or similar orchestration tools. β€’ Experience with cloud services and data tools in AWS (e.g., S3, Lambda, SQS, Gateway, Networking). β€’ Expertise with Databricks for collaborative data engineering and analytics. β€’ Solid understanding of data modeling, data warehousing, and best practices in data pipeline architecture. β€’ Strong problem-solving skills with the ability to work independently on complex tasks. β€’ Familiarity with CI/CD practices and version control (e.g., Git) in data engineering workflows.