TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
Learn/LLM Inference & Serving
🔥

LLM Inference & Serving

Once you can implement attention, the next problem is serving it. This track covers the production-side concerns: how do you sample efficiently? How do you reuse prefix computation across requests? What does PagedAttention actually do? These are the algorithms behind every modern inference engine.

9 problems · suggested order

  1. ○1#29KV Cachemedium
  2. ○2#213KV Cache: Pre-allocated Buffermedium
  3. ○3#32Top-k Samplingeasy
  4. ○4#33Top-p (Nucleus) Samplingeasy
  5. ○5#72Speculative Decodinghard
  6. ○6#197Prefix Caching (Prompt KV Reuse)hard
  7. ○7#66Paged Attention (vLLM)hard
  8. ○8#34Continuous Batching Schedulerhard
  9. ○9#27Flash Attention (Tiled)hard
Tracks are curated by hand. The order above is the suggested learning progression — feel free to skip around if you already know a topic.

© 2026 TorchedUp. All rights reserved.

ChangelogContact UsTerms of ServicePrivacy PolicyRefund Policy