TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
Learn/Transformer Internals
⚡

Transformer Internals

The transformer is 90% of modern AI. This track walks you through every component of the standard architecture, then the production tricks (KV cache, FlashAttention) that make it fast enough to serve. Implementing each piece from scratch is the fastest way to internalize what the papers actually mean.

10 problems · suggested order

  1. ○1#8Scaled Dot-Product Attentionmedium
  2. ○2#20Multi-Head Attentionhard
  3. ○3#215Masked Attention (Causal + Padding)medium
  4. ○4#14Sinusoidal Positional Encodingmedium
  5. ○5#21Rotary Position Embedding (RoPE)medium
  6. ○6#28Grouped Query Attention (GQA)medium
  7. ○7#29KV Cachemedium
  8. ○8#27Flash Attention (Tiled)hard
  9. ○9#87Attention Sinks (StreamingLLM)medium
  10. ○10#216LayerNorm with Pre-allocated Output Buffermedium
Tracks are curated by hand. The order above is the suggested learning progression — feel free to skip around if you already know a topic.

© 2026 TorchedUp. All rights reserved.

ChangelogContact UsTerms of ServicePrivacy PolicyRefund Policy