Before you can train anything large, you have to know whether it fits. This track combines napkin math (memory budgets, throughput estimates) with the actual algorithms (DDP, ZeRO, FSDP) that make training scale beyond a single GPU. The interview question "can we train a 70B model on 8 H100s?" stops being intimidating once you've worked through these.
8 problems · suggested order