

O2 Technologies,Inc
Perception Data Pipeline Engineer
⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Perception Data Pipeline Engineer with a contract length of "unknown" and a pay rate of "unknown." It requires onsite work in Foster City, CA, at least 3 days a week. Key skills include Python, C++, PySpark, and experience in autonomous vehicles. A Bachelor's degree in a related field is required.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
Unknown
-
🗓️ - Date
May 28, 2026
🕒 - Duration
Unknown
-
🏝️ - Location
On-site
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
Redwood City, CA
-
🧠 - Skills detailed
#Lambda (AWS Lambda) #ML Ops (Machine Learning Operations) #Data Engineering #Data Pipeline #PySpark #Python #Monitoring #Computer Science #REST (Representational State Transfer) #AWS (Amazon Web Services) #Observability #S3 (Amazon Simple Storage Service) #Documentation #API (Application Programming Interface) #Docker #REST API #Databricks #C++ #Spark (Apache Spark) #Classification #AI (Artificial Intelligence) #ML (Machine Learning) #AWS S3 (Amazon Simple Storage Service)
Role description
About The Role
Software Engineer, Perception Attributes Autolabeling Pipeline
Onsite in Foster City, CA | at least 3 days in office
The Perception Attribute Flywheel team is looking for a Software Engineer to build and operate the
autolabeling pipeline that accelerates human annotation throughput on vehicle attribute classification
tasks.
Zoox is building a future for Riders, not drivers. The accuracy of our perception attribute models —
recognizing emergency vehicles, school buses, brake lights, hazard signals, and more — depends on a
steady flow of high-quality labeled examples drawn from our fleet's drive data. Today, every label is
produced by a human annotator from scratch. We are building a pipeline that uses off-the-shelf foundation
models (Gemini, SigLIP, CLIP) to pre-label tasks, so human reviewers verify and correct rather than
labeling from scratch.
This role owns the pipeline engineering for that system: ingesting queued tasks from our annotator
service, calling foundation-model APIs at fleet scale, writing structured predictions back into the labeling
workflow, and operating the whole thing reliably. The team lead and supporting ML engineers own model
selection, prompt design, and evaluation methodology; this role partners closely with them but is not
expected to own those decisions.
If you take pride in building reliable, observable, well-tested data pipelines and want to ship a system that
visibly accelerates an autonomous vehicle program, you will excel in this role.
Responsibilities
• Build the autolabeling pipeline: ingest queued tasks from the annotator service, dispatch them to
foundation-model APIs (Gemini and others), parse structured outputs, and write pre-labels back to the
labeling workflow
• Build the observability layer: per-task latency, per-model cost, per-attribute coverage, error-mode
dashboards
• Run experiments designed by the team lead — set up the inputs, execute, collect outputs in formats the
ML engineers can analyze
• Integrate the pipeline cleanly with existing Zoox systems, partnering with the data infrastructure team
• Document the system, write runbooks, and ensure a clean handoff at end of engagement
Qualifications
• 3+ years of backend / data pipeline engineering experience
• Strong Python; comfort with C++
• Large-dataset experience with PySpark or equivalent
• ML fundamentals — understanding of model inference, embeddings, structured output, and common
eval metrics (precision, recall, calibration); able to reason about ML data shapes and integration
patterns
• Experience integrating foundation-models (Gemini, OpenAI, Anthropic) at production scale
• Excellent written communication for design docs and runbooks
Bonus Qualities — Experience With Any Of The Following
• Databricks
• End-to-end ML pipeline stewardship — owned an ML system in production from data ingest through
inference through monitoring
• Annotation tooling or human-in-the-loop ML workflows
• Autonomous-systems data pipelines
• AWS, especially S3, ECS/EKS, Lambda
• Working in a codebase shared with ML engineers (proto schemas, joint deploys)
Key Responsibilities & Skills
• Autolabeling Pipeline Development
• Vehicle Attribute Classification Data Flow
• Human-in-the-Loop Annotation Acceleration
• ML Model Inference Integration
• Observability & Monitoring of Data Pipelines
• Experimentation Support for ML Teams
• Documentation & Runbook Creation
• Cross-Team Integration with Data Infrastructure
Technical Skills
• Python
• C++
• PySpark / Spark
• AWS (S3 / ECS / EKS / Lambda)
• Databricks
• Foundation Model APIs (Gemini / OpenAI / Anthropic)
• REST API Integration
• Docker / Containerization
• Observability Dashboards
Education
Bachelor's Degree in Computer Science, Software Engineering, Electrical Engineering, Computer Engineering. Preferred: Master's in Computer Science, Master's in Artificial Intelligence, Master's in Machine Learning, PhD in Computer Science.
Industry Experience
• Autonomous Vehicles
• Autonomous Driving
• Automotive
• Computer Vision
• Machine Learning Operations (MLOps)
• Data Engineering for AV
#CareerOpportunities #JobVacancy #WorkWithUs
About The Role
Software Engineer, Perception Attributes Autolabeling Pipeline
Onsite in Foster City, CA | at least 3 days in office
The Perception Attribute Flywheel team is looking for a Software Engineer to build and operate the
autolabeling pipeline that accelerates human annotation throughput on vehicle attribute classification
tasks.
Zoox is building a future for Riders, not drivers. The accuracy of our perception attribute models —
recognizing emergency vehicles, school buses, brake lights, hazard signals, and more — depends on a
steady flow of high-quality labeled examples drawn from our fleet's drive data. Today, every label is
produced by a human annotator from scratch. We are building a pipeline that uses off-the-shelf foundation
models (Gemini, SigLIP, CLIP) to pre-label tasks, so human reviewers verify and correct rather than
labeling from scratch.
This role owns the pipeline engineering for that system: ingesting queued tasks from our annotator
service, calling foundation-model APIs at fleet scale, writing structured predictions back into the labeling
workflow, and operating the whole thing reliably. The team lead and supporting ML engineers own model
selection, prompt design, and evaluation methodology; this role partners closely with them but is not
expected to own those decisions.
If you take pride in building reliable, observable, well-tested data pipelines and want to ship a system that
visibly accelerates an autonomous vehicle program, you will excel in this role.
Responsibilities
• Build the autolabeling pipeline: ingest queued tasks from the annotator service, dispatch them to
foundation-model APIs (Gemini and others), parse structured outputs, and write pre-labels back to the
labeling workflow
• Build the observability layer: per-task latency, per-model cost, per-attribute coverage, error-mode
dashboards
• Run experiments designed by the team lead — set up the inputs, execute, collect outputs in formats the
ML engineers can analyze
• Integrate the pipeline cleanly with existing Zoox systems, partnering with the data infrastructure team
• Document the system, write runbooks, and ensure a clean handoff at end of engagement
Qualifications
• 3+ years of backend / data pipeline engineering experience
• Strong Python; comfort with C++
• Large-dataset experience with PySpark or equivalent
• ML fundamentals — understanding of model inference, embeddings, structured output, and common
eval metrics (precision, recall, calibration); able to reason about ML data shapes and integration
patterns
• Experience integrating foundation-models (Gemini, OpenAI, Anthropic) at production scale
• Excellent written communication for design docs and runbooks
Bonus Qualities — Experience With Any Of The Following
• Databricks
• End-to-end ML pipeline stewardship — owned an ML system in production from data ingest through
inference through monitoring
• Annotation tooling or human-in-the-loop ML workflows
• Autonomous-systems data pipelines
• AWS, especially S3, ECS/EKS, Lambda
• Working in a codebase shared with ML engineers (proto schemas, joint deploys)
Key Responsibilities & Skills
• Autolabeling Pipeline Development
• Vehicle Attribute Classification Data Flow
• Human-in-the-Loop Annotation Acceleration
• ML Model Inference Integration
• Observability & Monitoring of Data Pipelines
• Experimentation Support for ML Teams
• Documentation & Runbook Creation
• Cross-Team Integration with Data Infrastructure
Technical Skills
• Python
• C++
• PySpark / Spark
• AWS (S3 / ECS / EKS / Lambda)
• Databricks
• Foundation Model APIs (Gemini / OpenAI / Anthropic)
• REST API Integration
• Docker / Containerization
• Observability Dashboards
Education
Bachelor's Degree in Computer Science, Software Engineering, Electrical Engineering, Computer Engineering. Preferred: Master's in Computer Science, Master's in Artificial Intelligence, Master's in Machine Learning, PhD in Computer Science.
Industry Experience
• Autonomous Vehicles
• Autonomous Driving
• Automotive
• Computer Vision
• Machine Learning Operations (MLOps)
• Data Engineering for AV
#CareerOpportunities #JobVacancy #WorkWithUs





