

Crossing Hurdles
Machine Learning Engineer (CUDA) | $250/hr Remote
⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Machine Learning Engineer (CUDA) with a contract length of 10-40 hours/week at $120–$250/hour, remote. Key skills include CUDA expertise, GPU architecture, and performance optimization. Familiarity with PyTorch or TensorFlow is preferred.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
250
-
🗓️ - Date
November 12, 2025
🕒 - Duration
Unknown
-
🏝️ - Location
Remote
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
United States
-
🧠 - Skills detailed
#TensorFlow #PyTorch #AI (Artificial Intelligence) #ML (Machine Learning) #Documentation
Role description
At Crossing Hurdles, we work as a referral partner. We refer candidates to Mercor that collaborates with the world’s leading AI research labs to build and train cutting-edge AI models.
Organization: Mercor
Position: CUDA Kernel Optimizer – ML Engineer
Type: Hourly Contract
Compensation: $120–$250/hour
Location: Remote
Commitment: 10-40hr/week, flexible hours
Role Responsibilities (Training support will be provided)
• Develop, optimize, and benchmark CUDA kernels for tensor and operator workloads
• Tune for occupancy, memory coalescing, instruction-level parallelism, and optimal warp scheduling
• Profile and diagnose performance bottlenecks with tools such as Nsight Systems and Nsight Compute
• Report performance results, analyze speedups, and propose architectural improvements
• Integrate kernels with PyTorch and collaborate asynchronously with operator specialists
• Produce reproducible benchmarks and write comprehensive performance documentation
Required Qualifications
• Deep expertise in CUDA, GPU architecture, and memory optimization
• Proven record of quantifiable performance improvements across hardware generations
• Proficiency with mixed precision, Tensor Core usage, and low-level numerical stability
• Familiarity with PyTorch, TensorFlow, or Triton (preferred but not required)
• Strong communication and independent problem-solving skills
• Demonstrated contributions in open-source, research, or performance benchmarking
Application Process:
1. Upload resume
1. AI interview based on your resume (15 min)
1. Submit form
At Crossing Hurdles, we work as a referral partner. We refer candidates to Mercor that collaborates with the world’s leading AI research labs to build and train cutting-edge AI models.
Organization: Mercor
Position: CUDA Kernel Optimizer – ML Engineer
Type: Hourly Contract
Compensation: $120–$250/hour
Location: Remote
Commitment: 10-40hr/week, flexible hours
Role Responsibilities (Training support will be provided)
• Develop, optimize, and benchmark CUDA kernels for tensor and operator workloads
• Tune for occupancy, memory coalescing, instruction-level parallelism, and optimal warp scheduling
• Profile and diagnose performance bottlenecks with tools such as Nsight Systems and Nsight Compute
• Report performance results, analyze speedups, and propose architectural improvements
• Integrate kernels with PyTorch and collaborate asynchronously with operator specialists
• Produce reproducible benchmarks and write comprehensive performance documentation
Required Qualifications
• Deep expertise in CUDA, GPU architecture, and memory optimization
• Proven record of quantifiable performance improvements across hardware generations
• Proficiency with mixed precision, Tensor Core usage, and low-level numerical stability
• Familiarity with PyTorch, TensorFlow, or Triton (preferred but not required)
• Strong communication and independent problem-solving skills
• Demonstrated contributions in open-source, research, or performance benchmarking
Application Process:
1. Upload resume
1. AI interview based on your resume (15 min)
1. Submit form






