Data Pipeline Engineer

⭐ - Featured Role | Apply direct with Data Freelance Hub

This role is for a Data Pipeline Engineer in San Jose, CA, with a contract length of "unknown" and a pay rate of "unknown." Requires 5+ years of experience, expertise in Airflow, Kafka, Python (PySpark), and cloud platforms.

🌎 - Country

United States

💱 - Currency

$ USD

💰 - Day rate

🗓️ - Date discovered

September 3, 2025

🕒 - Project duration

Unknown

🏝️ - Location type

On-site

📄 - Contract type

Unknown

🔒 - Security clearance

Unknown

📍 - Location detailed

San Jose, CA

🧠 - Skills detailed

#GCP (Google Cloud Platform) #Airflow #Kafka (Apache Kafka) #API (Application Programming Interface) #Apache Spark #AWS (Amazon Web Services) #Spark (Apache Spark) #Data Processing #Observability #Apache Airflow #PySpark #Scala #Data Orchestration #Azure #S3 (Amazon Simple Storage Service) #Data Ingestion #Logging #Datasets #Batch #Data Engineering #Data Pipeline #Security #Storage #Data Warehouse #Cloud #Python

Role description

Job Description We are seeking a skilled Data Pipeline Engineer with deep expertise in building, orchestrating, and optimizing large-scale data ingestion pipelines. This role is perfect for someone who thrives on working with high-volume telemetry sources, refining complex data workflows, and solving challenges like schema drift in a distributed systems environment. Location: San Jose, CA (Onsite 2 days per week). Final-round interviews will be conducted in person. Key Skills: Proven experience designing and building multiple data pipelines, with deep expertise in Airflow, Kafka, Python (PySpark), and cloud platforms. Must have hands-on experience with large-scale data warehouses (managing multiple TBs). Key Responsibilities • Design, build, and manage scalable batch and real-time streaming pipelines for ingesting telemetry, log, and event data. • Develop, implement, and maintain robust data orchestration workflows using tools like Apache Airflow or similar platforms. • Onboard new data sources by building efficient connectors (API, Kafka, file-based) and normalizing diverse, security-related datasets. • Proactively monitor and manage schema evolution and drift across various source systems and data formats. • Implement comprehensive pipeline observability, including logging, performance metrics, and alerting systems. • Continuously optimize data ingestion for enhanced performance, reliability, and cost-effectiveness. • Collaborate with cross-functional teams, including detection, threat intelligence, and platform engineering, to align data ingestion with security objectives. Required Qualifications • 5+ years of professional experience in data engineering or infrastructure roles with a focus on pipeline development. • Strong proficiency in Python and extensive experience with distributed data processing frameworks like Apache Spark/PySpark. • Hands-on experience with orchestration and workflow management tools such as Apache Airflow, Dagster, or Prefect. • Deep understanding of data ingestion patterns, schema management, and strategies for handling schema drift. • Practical experience with messaging/streaming platforms (e.g., Kafka) and cloud-native storage services (e.g., S3). • Proven experience developing solutions in a major cloud environment (AWS preferred, Azure, or GCP) Skills: data,pipelines,cloud,airflow

Apply now Apply with DFH Sign up

← See all roles

Go to role

Data Pipeline Engineer

Premium Members Land Roles Faster—Upgrade today.

Python Streamlit

Business Analyst II (Contingent)

4G/5G Support Engineer

Data Engineer

Premium Members Land Roles Faster—Upgrade today.

Book a

chat

with us

Company