Learn/LLM Inference & Serving

LLM Inference & Serving

Once you can implement attention, the next problem is serving it. This track covers the production-side concerns: how do you sample efficiently? How do you reuse prefix computation across requests? What does PagedAttention actually do? These are the algorithms behind every modern inference engine.

9 problems · suggested order

Tracks are curated by hand. The order above is the suggested learning progression — feel free to skip around if you already know a topic.

Learn/LLM Inference & Serving

LLM Inference & Serving

9 problems · suggested order

Tracks are curated by hand. The order above is the suggested learning progression — feel free to skip around if you already know a topic.