O2 Technologies,Inc

Perception Data Pipeline Engineer

⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Perception Data Pipeline Engineer with a contract length of "unknown" and a pay rate of "unknown." It requires onsite work in Foster City, CA, at least 3 days a week. Key skills include Python, C++, PySpark, and experience in autonomous vehicles. A Bachelor's degree in a related field is required.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
Unknown
-
🗓️ - Date
May 28, 2026
🕒 - Duration
Unknown
-
🏝️ - Location
On-site
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
Redwood City, CA
-
🧠 - Skills detailed
#Lambda (AWS Lambda) #ML Ops (Machine Learning Operations) #Data Engineering #Data Pipeline #PySpark #Python #Monitoring #Computer Science #REST (Representational State Transfer) #AWS (Amazon Web Services) #Observability #S3 (Amazon Simple Storage Service) #Documentation #API (Application Programming Interface) #Docker #REST API #Databricks #C++ #Spark (Apache Spark) #Classification #AI (Artificial Intelligence) #ML (Machine Learning) #AWS S3 (Amazon Simple Storage Service)
Role description
About The Role Software Engineer, Perception Attributes Autolabeling Pipeline Onsite in Foster City, CA | at least 3 days in office The Perception Attribute Flywheel team is looking for a Software Engineer to build and operate the autolabeling pipeline that accelerates human annotation throughput on vehicle attribute classification tasks. Zoox is building a future for Riders, not drivers. The accuracy of our perception attribute models — recognizing emergency vehicles, school buses, brake lights, hazard signals, and more — depends on a steady flow of high-quality labeled examples drawn from our fleet's drive data. Today, every label is produced by a human annotator from scratch. We are building a pipeline that uses off-the-shelf foundation models (Gemini, SigLIP, CLIP) to pre-label tasks, so human reviewers verify and correct rather than labeling from scratch. This role owns the pipeline engineering for that system: ingesting queued tasks from our annotator service, calling foundation-model APIs at fleet scale, writing structured predictions back into the labeling workflow, and operating the whole thing reliably. The team lead and supporting ML engineers own model selection, prompt design, and evaluation methodology; this role partners closely with them but is not expected to own those decisions. If you take pride in building reliable, observable, well-tested data pipelines and want to ship a system that visibly accelerates an autonomous vehicle program, you will excel in this role. Responsibilities • Build the autolabeling pipeline: ingest queued tasks from the annotator service, dispatch them to foundation-model APIs (Gemini and others), parse structured outputs, and write pre-labels back to the labeling workflow • Build the observability layer: per-task latency, per-model cost, per-attribute coverage, error-mode dashboards • Run experiments designed by the team lead — set up the inputs, execute, collect outputs in formats the ML engineers can analyze • Integrate the pipeline cleanly with existing Zoox systems, partnering with the data infrastructure team • Document the system, write runbooks, and ensure a clean handoff at end of engagement Qualifications • 3+ years of backend / data pipeline engineering experience • Strong Python; comfort with C++ • Large-dataset experience with PySpark or equivalent • ML fundamentals — understanding of model inference, embeddings, structured output, and common eval metrics (precision, recall, calibration); able to reason about ML data shapes and integration patterns • Experience integrating foundation-models (Gemini, OpenAI, Anthropic) at production scale • Excellent written communication for design docs and runbooks Bonus Qualities — Experience With Any Of The Following • Databricks • End-to-end ML pipeline stewardship — owned an ML system in production from data ingest through inference through monitoring • Annotation tooling or human-in-the-loop ML workflows • Autonomous-systems data pipelines • AWS, especially S3, ECS/EKS, Lambda • Working in a codebase shared with ML engineers (proto schemas, joint deploys) Key Responsibilities & Skills • Autolabeling Pipeline Development • Vehicle Attribute Classification Data Flow • Human-in-the-Loop Annotation Acceleration • ML Model Inference Integration • Observability & Monitoring of Data Pipelines • Experimentation Support for ML Teams • Documentation & Runbook Creation • Cross-Team Integration with Data Infrastructure Technical Skills • Python • C++ • PySpark / Spark • AWS (S3 / ECS / EKS / Lambda) • Databricks • Foundation Model APIs (Gemini / OpenAI / Anthropic) • REST API Integration • Docker / Containerization • Observability Dashboards Education Bachelor's Degree in Computer Science, Software Engineering, Electrical Engineering, Computer Engineering. Preferred: Master's in Computer Science, Master's in Artificial Intelligence, Master's in Machine Learning, PhD in Computer Science. Industry Experience • Autonomous Vehicles • Autonomous Driving • Automotive • Computer Vision • Machine Learning Operations (MLOps) • Data Engineering for AV #CareerOpportunities #JobVacancy #WorkWithUs