Data Pipeline Engineer

⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Data Pipeline Engineer in San Jose, CA, with a contract length of "unknown" and a pay rate of "unknown." Requires 5+ years of experience, expertise in Airflow, Kafka, Python (PySpark), and cloud platforms.
🌎 - Country
United States
πŸ’± - Currency
$ USD
-
πŸ’° - Day rate
-
πŸ—“οΈ - Date discovered
September 3, 2025
πŸ•’ - Project duration
Unknown
-
🏝️ - Location type
On-site
-
πŸ“„ - Contract type
Unknown
-
πŸ”’ - Security clearance
Unknown
-
πŸ“ - Location detailed
San Jose, CA
-
🧠 - Skills detailed
#GCP (Google Cloud Platform) #Airflow #Kafka (Apache Kafka) #API (Application Programming Interface) #Apache Spark #AWS (Amazon Web Services) #Spark (Apache Spark) #Data Processing #Observability #Apache Airflow #PySpark #Scala #Data Orchestration #Azure #S3 (Amazon Simple Storage Service) #Data Ingestion #Logging #Datasets #Batch #Data Engineering #Data Pipeline #Security #Storage #Data Warehouse #Cloud #Python
Role description
Job Description We are seeking a skilled Data Pipeline Engineer with deep expertise in building, orchestrating, and optimizing large-scale data ingestion pipelines. This role is perfect for someone who thrives on working with high-volume telemetry sources, refining complex data workflows, and solving challenges like schema drift in a distributed systems environment. Location: San Jose, CA (Onsite 2 days per week). Final-round interviews will be conducted in person. Key Skills: Proven experience designing and building multiple data pipelines, with deep expertise in Airflow, Kafka, Python (PySpark), and cloud platforms. Must have hands-on experience with large-scale data warehouses (managing multiple TBs). Key Responsibilities β€’ Design, build, and manage scalable batch and real-time streaming pipelines for ingesting telemetry, log, and event data. β€’ Develop, implement, and maintain robust data orchestration workflows using tools like Apache Airflow or similar platforms. β€’ Onboard new data sources by building efficient connectors (API, Kafka, file-based) and normalizing diverse, security-related datasets. β€’ Proactively monitor and manage schema evolution and drift across various source systems and data formats. β€’ Implement comprehensive pipeline observability, including logging, performance metrics, and alerting systems. β€’ Continuously optimize data ingestion for enhanced performance, reliability, and cost-effectiveness. β€’ Collaborate with cross-functional teams, including detection, threat intelligence, and platform engineering, to align data ingestion with security objectives. Required Qualifications β€’ 5+ years of professional experience in data engineering or infrastructure roles with a focus on pipeline development. β€’ Strong proficiency in Python and extensive experience with distributed data processing frameworks like Apache Spark/PySpark. β€’ Hands-on experience with orchestration and workflow management tools such as Apache Airflow, Dagster, or Prefect. β€’ Deep understanding of data ingestion patterns, schema management, and strategies for handling schema drift. β€’ Practical experience with messaging/streaming platforms (e.g., Kafka) and cloud-native storage services (e.g., S3). β€’ Proven experience developing solutions in a major cloud environment (AWS preferred, Azure, or GCP) Skills: data,pipelines,cloud,airflow