Remote CUDA Kernel Optimizer - ML Engineer - AI Trainer ($120-$250 per hour)
###• 1) Role OverviewMercor is engaging advanced CUDA experts who specialize in GPU kernel optimization, performance profiling, and numerical efficiency. These professionals possess a deep mental model of how modern GPU architectures execute deep learning workloads. They are comfortable translating algorithmic concepts into finely tuned kernels that maximize throughput while maintaining correctness and reproducibility, ###• 2)Key Responsibilities• Develop, tune, and benchmark CUDA kernels for tensor and operator workloads.- Optimize for occupancy, memory coalescing, instruction-level parallelism, and warp scheduling. - Profile and diagnose performance bottlenecks using Nsight Systems, Nsight Compute, and comparable tools. - Report performance metrics, analyze speedups, and propose architectural improvements. - Collaborate asynchronously with PyTorch Operator Specialists to integrate kernels into production frameworks. - Produce well-documented, reproducible benchmarks and performance write-bolthires. ###• 3) Ideal Qualifications• Deep expertise in CUDA programming, GPU architecture, and memory optimization.- Proven ability to achieve quantifiable performance improvements across hardware generations. - Proficiency with mixed precision, Tensor Core usage, and low-level numerical stability considerations. - Familiarity with frameworks like PyTorch, TensorFlow, or Triton (not required but beneficial). - Strong communication skills and independent problem-solving ability. - Demonstrated open-source, research, or performance benchmarking contributions. ###• 4) More About the Opportunity• Ideal for independent contractors who thrive in performance-critical, systems-level work.- Engagements focus on measurable, high-impact kernel optimizations and scalability studies. - Work is fully remote and asynchronous; deliverables are outcome-driven. - Access to shared benchmarking infrastructure and reproducibility tooling via Mercor support resources. ###• 5)Compensation & Contract Terms• Typical range :• $120–$250 / hour•, depending on scope, specialization, and results achieved. Payments will be based on accepted task output over flat hourly. - Structured as a• contract-based engagement•, not an employment relationship.-Compensation tied to measurable deliverables or agreed milestones. - Confidentiality, IP, and NDA terms as defined per engagement. ###• 6) Application Process• Submit a brief overview of prior CUDA optimization experience, profiling results, or performance reports. - Include links to relevant GitHub repos, papers, or benchmarks if available. - Indicate your hourly rate, time availability, and preferred engagement length. - Selected experts may complete a small, paid pilot kernel optimization project ###• 7) About Mercor• Mercor• connects domain experts with top AI research and technology organizations through project-based contracts.- Contractors operate independently, with full flexibility over methods, timelines, and tools. -Our mission is to help top engineers and researchers access frontier technical work without rigid employment structures. Apply tot his job