

Data Pipeline Engineer (Local Only)
⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Data Pipeline Engineer in San Jose, CA, with a 12+ month contract. Requires 5+ years in data engineering, expertise in Python, Airflow, Kafka, and large-scale data warehouses, plus cloud experience. In-person interview required.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
-
🗓️ - Date discovered
August 27, 2025
🕒 - Project duration
More than 6 months
-
🏝️ - Location type
On-site
-
📄 - Contract type
Unknown
-
🔒 - Security clearance
Unknown
-
📍 - Location detailed
San Jose, CA
-
🧠 - Skills detailed
#Cloud #Logging #Data Normalization #Datasets #Spark (Apache Spark) #Security #Databricks #Normalization #Batch #ML (Machine Learning) #Storage #Airflow #Compliance #Kafka (Apache Kafka) #AWS (Amazon Web Services) #S3 (Amazon Simple Storage Service) #Data Engineering #Apache Airflow #PySpark #Azure #Python #Scala #"ETL (Extract #Transform #Load)" #Argo #Data Processing #Data Pipeline #Observability #Data Warehouse #Data Ingestion #GCP (Google Cloud Platform) #API (Application Programming Interface) #Apache Spark
Role description
Role: Data Pipeline Engineer (Local Only)
Location: San Jose, CA (onsite 2X a week) In Person Interview Required
Duration: 12+ month contract
Key Skills: experience building multiple Data Pipelines, Airflow, Kafka, Python (PySpark), Cloud experience. Must have experience working with Large Scale Data warehouses (multiple TBs).
Here are some screening questions the manager provided to help us find the best candidates….
• Do you have hands on experience working with Databricks in an AWS environment?
• Have you created data pipelines from multiple data sources?
• What tools are you using for orchestration workflows (e.g., Apache Airflow)? Have you built or maintained DAGs in production? Have you worked with any alternatives to Airflow, like Argo or Prefect?
• Have you built data connectors? If so, are you using APIs or Kafka?
• How many years of experience do you have working with Spark or PySpark?
• Can you describe your experience with data ingestion processes? Do you have experience with data normalization or transformation, have you worked with semi-structured or unstructured data?
• Can you describe a time you had to troubleshoot or optimize a large-scale data pipeline in a production environment?
Description:
We’re looking for a Data Pipeline Engineer with deep experience building and orchestrating large-scale ingestion pipelines. This role is ideal for someone who enjoys working across high-volume telemetry sources, optimizing data workflows, and solving schema drift challenges in real-world distributed environments.
You’ll be part of the Security Data Platform and ML Engineering team, helping to onboard and normalize security data that powers analytics, detection, and ML workflows across the BU.
Key Responsibilities:
• Design and build scalable batch and streaming data pipelines for ingesting telemetry, log, and event data
• Develop and maintain orchestration workflows using tools like Apache Airflow or similar schedulers
• Onboard new data sources, build connectors (API/Kafka/file-based), and normalize security-related datasets
• Monitor and manage schema drift across changing source systems and formats
• Implement observability into pipelines — logging, metrics, and alerts for health and performance
• Optimize ingestion for performance, resilience, and cost-efficiency
• Collaborate across detection, threat intel, and platform teams to align ingestion with security use cases
Required Qualifications:
• 5+ years of experience in data engineering or infrastructure roles focused on pipeline development
• Strong experience with Python and distributed data processing tools like Apache Spark or PySpark
• Hands-on experience with orchestration frameworks like Apache Airflow, Dagster, or similar
• Deep understanding of ingestion best practices, schema evolution, and drift handling
• Experience working with Kafka, S3, or cloud-native storage and messaging systems
• Experience in cloud environments (AWS, Azure, or GCP)
• Bonus: Familiarity with security tools (e.g., Crowdstrike, Wiz), OCSF, or compliance-related data
Role: Data Pipeline Engineer (Local Only)
Location: San Jose, CA (onsite 2X a week) In Person Interview Required
Duration: 12+ month contract
Key Skills: experience building multiple Data Pipelines, Airflow, Kafka, Python (PySpark), Cloud experience. Must have experience working with Large Scale Data warehouses (multiple TBs).
Here are some screening questions the manager provided to help us find the best candidates….
• Do you have hands on experience working with Databricks in an AWS environment?
• Have you created data pipelines from multiple data sources?
• What tools are you using for orchestration workflows (e.g., Apache Airflow)? Have you built or maintained DAGs in production? Have you worked with any alternatives to Airflow, like Argo or Prefect?
• Have you built data connectors? If so, are you using APIs or Kafka?
• How many years of experience do you have working with Spark or PySpark?
• Can you describe your experience with data ingestion processes? Do you have experience with data normalization or transformation, have you worked with semi-structured or unstructured data?
• Can you describe a time you had to troubleshoot or optimize a large-scale data pipeline in a production environment?
Description:
We’re looking for a Data Pipeline Engineer with deep experience building and orchestrating large-scale ingestion pipelines. This role is ideal for someone who enjoys working across high-volume telemetry sources, optimizing data workflows, and solving schema drift challenges in real-world distributed environments.
You’ll be part of the Security Data Platform and ML Engineering team, helping to onboard and normalize security data that powers analytics, detection, and ML workflows across the BU.
Key Responsibilities:
• Design and build scalable batch and streaming data pipelines for ingesting telemetry, log, and event data
• Develop and maintain orchestration workflows using tools like Apache Airflow or similar schedulers
• Onboard new data sources, build connectors (API/Kafka/file-based), and normalize security-related datasets
• Monitor and manage schema drift across changing source systems and formats
• Implement observability into pipelines — logging, metrics, and alerts for health and performance
• Optimize ingestion for performance, resilience, and cost-efficiency
• Collaborate across detection, threat intel, and platform teams to align ingestion with security use cases
Required Qualifications:
• 5+ years of experience in data engineering or infrastructure roles focused on pipeline development
• Strong experience with Python and distributed data processing tools like Apache Spark or PySpark
• Hands-on experience with orchestration frameworks like Apache Airflow, Dagster, or similar
• Deep understanding of ingestion best practices, schema evolution, and drift handling
• Experience working with Kafka, S3, or cloud-native storage and messaging systems
• Experience in cloud environments (AWS, Azure, or GCP)
• Bonus: Familiarity with security tools (e.g., Crowdstrike, Wiz), OCSF, or compliance-related data