

Crossing Hurdles
Machine Learning Engineer (CUDA) | $250/hr Remote | Mercor
⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Machine Learning Engineer (CUDA) with a contract length of 10-40 hours per week, paying $120–$250/hour. Key skills include deep expertise in CUDA, GPU architecture, and performance optimization, with familiarity in PyTorch preferred.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
250
-
🗓️ - Date
December 5, 2025
🕒 - Duration
Unknown
-
🏝️ - Location
Remote
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
United States
-
🧠 - Skills detailed
#Documentation #AI (Artificial Intelligence) #PyTorch #ML (Machine Learning) #TensorFlow
Role description
At Crossing Hurdles, we work as a referral partner. We refer candidates to Mercor that collaborates with the world’s leading AI research labs to build and train cutting-edge AI models.
Organization: Mercor
Position: CUDA Kernel Optimizer – ML Engineer
Type: Hourly Contract
Compensation: $120–$250/hour
Location: Remote
Commitment: 10-40hr/week, flexible and asynchronous
Role Responsibilities: (Training support will be provided)
• Develop, optimize, and benchmark CUDA kernels for tensor and operator workloads
• Tune for occupancy, memory coalescing, instruction-level parallelism, and optimal warp scheduling
• Profile and diagnose performance bottlenecks with tools such as Nsight Systems and Nsight Compute
• Report performance results, analyze speedups, and propose architectural improvements
• Integrate kernels with PyTorch and collaborate asynchronously with operator specialists
• Produce reproducible benchmarks and write comprehensive performance documentation
Required Qualifications:
• Deep expertise in CUDA, GPU architecture, and memory optimization
• Proven record of quantifiable performance improvements across hardware generations
• Proficiency with mixed precision, Tensor Core usage, and low-level numerical stability
• Familiarity with PyTorch, TensorFlow, or Triton (preferred but not required)
• Strong communication and independent problem-solving skills
• Demonstrated contributions in open-source, research, or performance benchmarking
Application process: (Takes 20 min)
• Upload resume
• AI interview based on your resume (15 min)
• Submit form
At Crossing Hurdles, we work as a referral partner. We refer candidates to Mercor that collaborates with the world’s leading AI research labs to build and train cutting-edge AI models.
Organization: Mercor
Position: CUDA Kernel Optimizer – ML Engineer
Type: Hourly Contract
Compensation: $120–$250/hour
Location: Remote
Commitment: 10-40hr/week, flexible and asynchronous
Role Responsibilities: (Training support will be provided)
• Develop, optimize, and benchmark CUDA kernels for tensor and operator workloads
• Tune for occupancy, memory coalescing, instruction-level parallelism, and optimal warp scheduling
• Profile and diagnose performance bottlenecks with tools such as Nsight Systems and Nsight Compute
• Report performance results, analyze speedups, and propose architectural improvements
• Integrate kernels with PyTorch and collaborate asynchronously with operator specialists
• Produce reproducible benchmarks and write comprehensive performance documentation
Required Qualifications:
• Deep expertise in CUDA, GPU architecture, and memory optimization
• Proven record of quantifiable performance improvements across hardware generations
• Proficiency with mixed precision, Tensor Core usage, and low-level numerical stability
• Familiarity with PyTorch, TensorFlow, or Triton (preferred but not required)
• Strong communication and independent problem-solving skills
• Demonstrated contributions in open-source, research, or performance benchmarking
Application process: (Takes 20 min)
• Upload resume
• AI interview based on your resume (15 min)
• Submit form






