

Data Pipeline Engineer
β - Featured Role | Apply direct with Data Freelance Hub
This role is for a Data Pipeline Engineer in San Jose, CA, with a contract length of "unknown" and a pay rate of "unknown." Requires 5+ years of experience, expertise in Airflow, Kafka, Python (PySpark), and cloud platforms.
π - Country
United States
π± - Currency
$ USD
-
π° - Day rate
-
ποΈ - Date discovered
September 3, 2025
π - Project duration
Unknown
-
ποΈ - Location type
On-site
-
π - Contract type
Unknown
-
π - Security clearance
Unknown
-
π - Location detailed
San Jose, CA
-
π§ - Skills detailed
#GCP (Google Cloud Platform) #Airflow #Kafka (Apache Kafka) #API (Application Programming Interface) #Apache Spark #AWS (Amazon Web Services) #Spark (Apache Spark) #Data Processing #Observability #Apache Airflow #PySpark #Scala #Data Orchestration #Azure #S3 (Amazon Simple Storage Service) #Data Ingestion #Logging #Datasets #Batch #Data Engineering #Data Pipeline #Security #Storage #Data Warehouse #Cloud #Python
Role description
Job Description
We are seeking a skilled Data Pipeline Engineer with deep expertise in building, orchestrating, and optimizing large-scale data ingestion pipelines. This role is perfect for someone who thrives on working with high-volume telemetry sources, refining complex data workflows, and solving challenges like schema drift in a distributed systems environment.
Location: San Jose, CA (Onsite 2 days per week). Final-round interviews will be conducted in person.
Key Skills: Proven experience designing and building multiple data pipelines, with deep expertise in Airflow, Kafka, Python (PySpark), and cloud platforms. Must have hands-on experience with large-scale data warehouses (managing multiple TBs).
Key Responsibilities
β’ Design, build, and manage scalable batch and real-time streaming pipelines for ingesting telemetry, log, and event data.
β’ Develop, implement, and maintain robust data orchestration workflows using tools like Apache Airflow or similar platforms.
β’ Onboard new data sources by building efficient connectors (API, Kafka, file-based) and normalizing diverse, security-related datasets.
β’ Proactively monitor and manage schema evolution and drift across various source systems and data formats.
β’ Implement comprehensive pipeline observability, including logging, performance metrics, and alerting systems.
β’ Continuously optimize data ingestion for enhanced performance, reliability, and cost-effectiveness.
β’ Collaborate with cross-functional teams, including detection, threat intelligence, and platform engineering, to align data ingestion with security objectives.
Required Qualifications
β’ 5+ years of professional experience in data engineering or infrastructure roles with a focus on pipeline development.
β’ Strong proficiency in Python and extensive experience with distributed data processing frameworks like Apache Spark/PySpark.
β’ Hands-on experience with orchestration and workflow management tools such as Apache Airflow, Dagster, or Prefect.
β’ Deep understanding of data ingestion patterns, schema management, and strategies for handling schema drift.
β’ Practical experience with messaging/streaming platforms (e.g., Kafka) and cloud-native storage services (e.g., S3).
β’ Proven experience developing solutions in a major cloud environment (AWS preferred, Azure, or GCP)
Skills: data,pipelines,cloud,airflow
Job Description
We are seeking a skilled Data Pipeline Engineer with deep expertise in building, orchestrating, and optimizing large-scale data ingestion pipelines. This role is perfect for someone who thrives on working with high-volume telemetry sources, refining complex data workflows, and solving challenges like schema drift in a distributed systems environment.
Location: San Jose, CA (Onsite 2 days per week). Final-round interviews will be conducted in person.
Key Skills: Proven experience designing and building multiple data pipelines, with deep expertise in Airflow, Kafka, Python (PySpark), and cloud platforms. Must have hands-on experience with large-scale data warehouses (managing multiple TBs).
Key Responsibilities
β’ Design, build, and manage scalable batch and real-time streaming pipelines for ingesting telemetry, log, and event data.
β’ Develop, implement, and maintain robust data orchestration workflows using tools like Apache Airflow or similar platforms.
β’ Onboard new data sources by building efficient connectors (API, Kafka, file-based) and normalizing diverse, security-related datasets.
β’ Proactively monitor and manage schema evolution and drift across various source systems and data formats.
β’ Implement comprehensive pipeline observability, including logging, performance metrics, and alerting systems.
β’ Continuously optimize data ingestion for enhanced performance, reliability, and cost-effectiveness.
β’ Collaborate with cross-functional teams, including detection, threat intelligence, and platform engineering, to align data ingestion with security objectives.
Required Qualifications
β’ 5+ years of professional experience in data engineering or infrastructure roles with a focus on pipeline development.
β’ Strong proficiency in Python and extensive experience with distributed data processing frameworks like Apache Spark/PySpark.
β’ Hands-on experience with orchestration and workflow management tools such as Apache Airflow, Dagster, or Prefect.
β’ Deep understanding of data ingestion patterns, schema management, and strategies for handling schema drift.
β’ Practical experience with messaging/streaming platforms (e.g., Kafka) and cloud-native storage services (e.g., S3).
β’ Proven experience developing solutions in a major cloud environment (AWS preferred, Azure, or GCP)
Skills: data,pipelines,cloud,airflow