

OSI Engineering
On-Device AI Runtime Engineer
⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for an On-Device AI Runtime Engineer, a 3-month contract (possible extension) in San Diego or Sunnyvale, CA, with a pay rate of $78.00 - $93.00. Key skills include Swift/Objective-C, Metal Performance Shaders, and ML model optimization experience.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
744
-
🗓️ - Date
October 11, 2025
🕒 - Duration
3 to 6 months
-
🏝️ - Location
On-site
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
Cupertino, CA
-
🧠 - Skills detailed
#Consulting #TensorFlow #PyTorch #Scala #AI (Artificial Intelligence) #Model Optimization #Data Science #Data Analysis #ML (Machine Learning)
Role description
A globally leading technology company is looking for an On-Device AI Runtime Engineer to join its cutting-edge AI team. In this position, you will be a key contributor in building high-performance machine learning inference systems, developing optimized AI drivers for edge devices, creating scalable model lifecycle management solutions, and delivering efficient on-device runtime drivers for AI inference across a wide range of hardware platforms.
This role will focus on optimizing and deploying machine learning models on edge and mobile devices. Please note that this is not a Data Science or Data Analyst role.
Job Responsibilities:
• Design and implement robust Core ML model optimization pipelines for deploying large-scale ML models on resource-constrained devices.
• Support product engineering teams by consulting on AI model performance, iterating on inference solutions to solve real-world mobile/edge AI problems, and developing/delivering custom on-device AI frameworks.
• Interface with hardware and platform teams to ensure optimal utilization of neural processing units (NPUs), GPUs, and specialized AI accelerators across the device ecosystem.
Minimum Qualifications:
• Strong proficiency in Swift/Objective-C and Metal Performance Shaders.
• Familiar with various ML model formats such as Core ML, ONNX, TensorFlow Lite, and PyTorch Mobile.
• Strong critical thinking, performance optimization, and low-level system design skills.
• Experience with model quantization, pruning, and hardware-aware neural architecture optimization.
• Experience with real-time inference pipelines and latency-critical AI applications.
• Understanding of mobile device thermal management, power consumption patterns, and compute resource allocation for AI workloads.
Type: Contract
Duration: 3 months (with a possibility for extension)
Work Location: San Diego or Sunnyvale, CA (On-site)
Pay rate: $78.00 - $93.00 (DOE)
A globally leading technology company is looking for an On-Device AI Runtime Engineer to join its cutting-edge AI team. In this position, you will be a key contributor in building high-performance machine learning inference systems, developing optimized AI drivers for edge devices, creating scalable model lifecycle management solutions, and delivering efficient on-device runtime drivers for AI inference across a wide range of hardware platforms.
This role will focus on optimizing and deploying machine learning models on edge and mobile devices. Please note that this is not a Data Science or Data Analyst role.
Job Responsibilities:
• Design and implement robust Core ML model optimization pipelines for deploying large-scale ML models on resource-constrained devices.
• Support product engineering teams by consulting on AI model performance, iterating on inference solutions to solve real-world mobile/edge AI problems, and developing/delivering custom on-device AI frameworks.
• Interface with hardware and platform teams to ensure optimal utilization of neural processing units (NPUs), GPUs, and specialized AI accelerators across the device ecosystem.
Minimum Qualifications:
• Strong proficiency in Swift/Objective-C and Metal Performance Shaders.
• Familiar with various ML model formats such as Core ML, ONNX, TensorFlow Lite, and PyTorch Mobile.
• Strong critical thinking, performance optimization, and low-level system design skills.
• Experience with model quantization, pruning, and hardware-aware neural architecture optimization.
• Experience with real-time inference pipelines and latency-critical AI applications.
• Understanding of mobile device thermal management, power consumption patterns, and compute resource allocation for AI workloads.
Type: Contract
Duration: 3 months (with a possibility for extension)
Work Location: San Diego or Sunnyvale, CA (On-site)
Pay rate: $78.00 - $93.00 (DOE)