Member of Engineering – Pre-training, Data Engineering

Remote Full-time
Job Description: • Build and maintain high-performance pipelines for trillions of tokens. • Deliver diverse and high quality datasets for pre-training foundation models. • Closely work with other teams such as Pretraining, Posttraining, Evals and Product to to ensure alignment on the quality of the models delivered. Requirements: • Strong background in building production-grade, distributed data systems for machine learning, with experience in: • Orchestration: Slurm, Airflow, or Dagster • Observability & Reliability: CI/CD, Grafana, Prometheus, etc. • Infra: Git, Docker, k8s, cloud managed services • Batched inference (ex: vLLM) • Performance obsession, especially with large-scale GPU clusters and distributed pipelines • Expert-level python knowledge and ability to write clean and maintainable code • Strong algorithmic foundations • Proficiency with libraries like Polars, Dask, or PySpark • Nice to have: • Experience in building trillion-scale SOTA pretraining datasets • Experience translating research to production at scale • Experience with OCR, web crawling, or evals • Prior experience pre-training LLMs Benefits: • Fully remote work & flexible hours • 37 days/year of vacation & holidays • Health insurance allowance for you and dependents • Company-provided equipment • Wellbeing, always-be-learning and home office allowances • Frequent team get togethers • Great diverse & inclusive people-first culture Apply tot his job
Apply Now →

Similar Jobs

[Remote] Data Engineer - Scala, Spark, Databricks

Remote Full-time

AI Implementation Consultant

Remote Full-time

Senior AI Consultant / Architect – Internal AI Policy Agent & Commercial Process Audit

Remote Full-time

AI Engineer- Level III

Remote Full-time

Senior AI Data Engineer, Risk Engineering

Remote Full-time

Cloud Data Engineer (SQL, AI Search, and Observability)

Remote Full-time

Engineering Manager – Machine Learning | Runway | $310k-$370k | Remote (USA, Canada)

Remote Full-time

Sr Mgr, Software Engineering with AI/ML - Remote (CST Work Hours)

Remote Full-time

Engineering Manager (AI Agents Team)

Remote Full-time

AI Engineer (Remote) at Continued

Remote Full-time

Security Governance, Risk & Compliance Analyst

Remote Full-time

**Experienced Customer Service Representative – Telecom and Technology Industry Specialist**

Remote Full-time

Spanish Over-the-Phone Interpreters (Freelance/Remote)

Remote Full-time

Experienced Online Chat Agent for Remote Customer Support – Launch Your Career with arenaflex in a Dynamic and Flexible Work Environment

Remote Full-time

Experienced Data Analyst for Transportation Analytics Team - Remote Opportunity with arenaflex - Utilizing SQL, Tableau, and Business Operations Expertise

Remote Full-time

Experienced Remote Customer Service Representative – Delivering Exceptional Support and Driving Customer Satisfaction for arenaflex

Remote Full-time

Grant Writer/Contractor

Remote Full-time

Entry-Level Remote IT Chat Support Assistant – Provide Online Technical Help (No Degree Needed)

Remote Full-time

Editor/Writer

Remote Full-time

Communications Center Operator Representative (Evenings)

Remote Full-time
← Back to Home