The transformer is 90% of modern AI. This track walks you through every component of the standard architecture, then the production tricks (KV cache, FlashAttention) that make it fast enough to serve. Implementing each piece from scratch is the fastest way to internalize what the papers actually mean.
10 problems · suggested order