

OSI Engineering
Senior Data Engineer
⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Senior Data Engineer on a 12-month contract based in Sunnyvale or San Diego, CA, with a pay rate of $75.00 - $90.00. Requires 8+ years in ETL platforms, expertise in Apache Spark, Airflow, and data lakehouse technologies.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
720
-
🗓️ - Date
November 13, 2025
🕒 - Duration
More than 6 months
-
🏝️ - Location
On-site
-
📄 - Contract
W2 Contractor
-
🔒 - Security
Unknown
-
📍 - Location detailed
Sunnyvale, CA
-
🧠 - Skills detailed
#Anomaly Detection #Automated Testing #Deployment #Cloud #Apache Spark #Libraries #Apache Iceberg #Metadata #Compliance #ML (Machine Learning) #BigQuery #Dataflow #GIT #Data Engineering #Computer Science #Data Lineage #Scala #Delta Lake #Data Layers #Kubernetes #Observability #Storage #Apache Airflow #Documentation #Java #Data Lake #Data Lakehouse #Airflow #Data Management #GCP (Google Cloud Platform) #Prometheus #AWS (Amazon Web Services) #Azure #Batch #"ETL (Extract #Transform #Load)" #Python #Datasets #Security #Spark (Apache Spark) #Monitoring #Data Catalog #Grafana #Kafka (Apache Kafka) #Databricks
Role description
A globally leading consumer device company based in Cupertino, CA is looking for a Senior Data Engineer to join their team and help build the next generation of cellular analytics. You will work on production-grade ETL platforms that ingest, transform, and curate massive wireless telemetry datasets for near-real-time and batch use cases.
Role and Responsibilities:
• Design, implement, and operate resilient batch and streaming ETL jobs in Spark that process terabytes of cellular network data daily, with clear KPIs for latency and availability
• Build Airflow DAGs with strong observability, retries, SLAs, and automated remediation to keep production data flowing
• Develop reusable libraries, testing harnesses, and CI/CD workflows that enable rapid, safe deployments and empower partner teams to self-serve
• Partner with ML engineers to publish feature-ready datasets and model monitoring telemetry that align with medallion best practices
• Implement automated validation, anomaly detection, and reconciliation frameworks that ensure trustworthy data at scale
• Instrument data lineage, metadata cataloging, and documentation workflows to support discovery and compliance requirements
• Collaborate with platform & product teams, system engineers, researchers, and security teams.
Required Skills and Experience:
• 8+ years delivering production ETL platforms and lakehouse datasets for large-scale systems, including ownership of business-critical workloads
• Proven experience architecting, operating, and continuously scaling petabyte-class ETL/ELT platforms that power mission-critical analytics and ML workloads across bronze/silver/gold data layers
• Ability to craft multi-year data platform roadmaps, drive architectural decisions, and align stakeholders around standards for quality, performance, and cost efficiency
• Deep hands-on proficiency with Apache Spark (batch and structured streaming) on-prem or cloud stacks, including performance tuning, job observability, and production incident response
• Production experience orchestrating complex pipelines with Apache Airflow (or equivalent), including DAG design, robust dependency modeling, SLA management, and operational excellence
• Expertise with data lakehouse technologies (Apache Iceberg, Delta Lake, Hudi) and columnar storage formats (Parquet, ORC) for scalable, reliable data management
• Practical knowledge of event streaming patterns and tooling such as Kafka, Kinesis, or Pulsar for ingesting high-volume network telemetry
• Strong foundation in Python, Scala, or Java; disciplined CI/CD, automated testing, infrastructure-as-code, and Git-based workflows
• Ability to design pragmatic schemas and semantic layers that serve ETL throughput, downstream analytics, and ML feature engineering
• Experience delivering pipelines on AWS, GCP, or Azure using services like EMR, Databricks, Glue, Dataflow, BigQuery, or equivalent
• Familiarity with Kubernetes, containerized deployments, and observability stacks (Prometheus, Grafana, ELK, OpenTelemetry) for proactive monitoring, rapid recovery, and continuous improvement
• Experience working with large-scale telemetry data is a plus
• Bachelor's degree or higher in Computer Science, Data Engineering, Electrical Engineering, or related technical field (or equivalent practical experience)
Type: Contract
Duration: 12 months with extension
Work Location: Sunnyvale, CA or San Diego, CA (100% on site)
Pay rate: $75.00 - $90.00 (DOE)
No 3rd party agencies or C2C
A globally leading consumer device company based in Cupertino, CA is looking for a Senior Data Engineer to join their team and help build the next generation of cellular analytics. You will work on production-grade ETL platforms that ingest, transform, and curate massive wireless telemetry datasets for near-real-time and batch use cases.
Role and Responsibilities:
• Design, implement, and operate resilient batch and streaming ETL jobs in Spark that process terabytes of cellular network data daily, with clear KPIs for latency and availability
• Build Airflow DAGs with strong observability, retries, SLAs, and automated remediation to keep production data flowing
• Develop reusable libraries, testing harnesses, and CI/CD workflows that enable rapid, safe deployments and empower partner teams to self-serve
• Partner with ML engineers to publish feature-ready datasets and model monitoring telemetry that align with medallion best practices
• Implement automated validation, anomaly detection, and reconciliation frameworks that ensure trustworthy data at scale
• Instrument data lineage, metadata cataloging, and documentation workflows to support discovery and compliance requirements
• Collaborate with platform & product teams, system engineers, researchers, and security teams.
Required Skills and Experience:
• 8+ years delivering production ETL platforms and lakehouse datasets for large-scale systems, including ownership of business-critical workloads
• Proven experience architecting, operating, and continuously scaling petabyte-class ETL/ELT platforms that power mission-critical analytics and ML workloads across bronze/silver/gold data layers
• Ability to craft multi-year data platform roadmaps, drive architectural decisions, and align stakeholders around standards for quality, performance, and cost efficiency
• Deep hands-on proficiency with Apache Spark (batch and structured streaming) on-prem or cloud stacks, including performance tuning, job observability, and production incident response
• Production experience orchestrating complex pipelines with Apache Airflow (or equivalent), including DAG design, robust dependency modeling, SLA management, and operational excellence
• Expertise with data lakehouse technologies (Apache Iceberg, Delta Lake, Hudi) and columnar storage formats (Parquet, ORC) for scalable, reliable data management
• Practical knowledge of event streaming patterns and tooling such as Kafka, Kinesis, or Pulsar for ingesting high-volume network telemetry
• Strong foundation in Python, Scala, or Java; disciplined CI/CD, automated testing, infrastructure-as-code, and Git-based workflows
• Ability to design pragmatic schemas and semantic layers that serve ETL throughput, downstream analytics, and ML feature engineering
• Experience delivering pipelines on AWS, GCP, or Azure using services like EMR, Databricks, Glue, Dataflow, BigQuery, or equivalent
• Familiarity with Kubernetes, containerized deployments, and observability stacks (Prometheus, Grafana, ELK, OpenTelemetry) for proactive monitoring, rapid recovery, and continuous improvement
• Experience working with large-scale telemetry data is a plus
• Bachelor's degree or higher in Computer Science, Data Engineering, Electrical Engineering, or related technical field (or equivalent practical experience)
Type: Contract
Duration: 12 months with extension
Work Location: Sunnyvale, CA or San Diego, CA (100% on site)
Pay rate: $75.00 - $90.00 (DOE)
No 3rd party agencies or C2C






