Learn/Distributed Training & Memory Math

Distributed Training & Memory Math

Before you can train anything large, you have to know whether it fits. This track combines napkin math (memory budgets, throughput estimates) with the actual algorithms (DDP, ZeRO, FSDP) that make training scale beyond a single GPU. The interview question "can we train a 70B model on 8 H100s?" stops being intimidating once you've worked through these.

8 problems · suggested order

Tracks are curated by hand. The order above is the suggested learning progression — feel free to skip around if you already know a topic.

Learn/Distributed Training & Memory Math

Distributed Training & Memory Math

8 problems · suggested order

Tracks are curated by hand. The order above is the suggested learning progression — feel free to skip around if you already know a topic.