Senior Deep Learning Engineer – Autonomous Vehicles

Remote Full-time
Job Description:• Crafting, scaling, and hardening deep learning infrastructure libraries and frameworks for training on multi-thousand GPU clusters. • Improving efficiency throughout the training stack: data loaders, distributed training, scheduling, and performance monitoring. • Building robust training pipelines and libraries to handle massive video datasets and enable rapid experimentation. • Collaborating with researchers, model engineers, and internal platform teams to enhance efficiency, minimize stalls, and improve training availability.• Owning core infrastructure components such as orchestration libraries, distributed training frameworks, and fault-resilient training systems. • Partnering with leadership to ensure infrastructure scales with growing GPU capacity and dataset size while maintaining developer efficiency and stability. Requirements:• BS, MS, or PhD in Computer Science, Electrical/Computer Engineering, or a related field, or equivalent experience. • 12+ years of professional experience building and scaling high-performance distributed systems, ideally in ML, HPC, or large-scale data infrastructure.• Extensive knowledge in deep learning frameworks (PyTorch is preferred), large scale training (DDP/FSDP, NCCL, tensor/pipeline parallelism), and performance profiling. • Strong systems background: datacenter networking (RoCE, IB), parallel filesystems (Lustre), storage systems, schedulers (Slurm, Kubernetes, etc.). • Proficiency in Python and C++, with experience writing production-grade libraries, orchestration layers, and automation tools. • Ability to work closely with multi-functional teams (ML researchers, infra engineers, product leads) and translate requirements into robust systems.Benefits:• equity• benefits Apply tot his job
Apply Now →

Similar Jobs

Senior Machine Learning Engineer - Content and Contributor Intelligence (Remote - United States)

Remote Full-time

Software Engineer L4/L5 Training Platform, Machine Learning Platform

Remote Full-time

Machine Learning Engineer – Infrastructure and Automation | Helm.ai | Remote (Worldwide)

Remote Full-time

Staff Machine Learning Engineer, Risk AI/ML: EVENT (Remote - US)

Remote Full-time

Senior Machine Learning Engineer (Remote)

Remote Full-time

Machine Learning Engineer - Training & Infrastructure

Remote Full-time

Deep Learning/AI Engineer (Europe - Remote)

Remote Full-time

Sr AI Specialist, Solution Consulting - Federal - Civilian

Remote Full-time

Quantitative ML Consultant (Contract)

Remote Full-time

Talent Acquisition Partner II

Remote Full-time

Product Analyst – Talon Financial Services – Oak Brook, IL

Remote Full-time

Medical Devices, Business/Systems Analyst - (Remote, US)

Remote Full-time

Principal Product Marketing Manager, AI Experiences

Remote Full-time

SAP Project manager – Remote, PA

Remote Full-time

SVP, Digital Finance - Amazon

Remote Full-time

Software Development Engineering Manager job at FIS - Fidelity National Information Services in Chicago, IL, Brown Deer, WI, Atlanta, GA, Jacksonville, FL

Remote Full-time

Field Service Engineer - Semiconductor Tool

Remote Full-time

VP Analyst, Vice President Analyst: Banking, Payments and Financial Services (Remote US)

Remote Full-time

Portfolio Business Architect Principal-2

Remote Full-time

Junior Medical Coder – Remote Role for Fresh Graduates

Remote Full-time
← Back to Home