183 coding challenges across numpy, PyTorch, transformers, and distributed systems.
| ✓ | # | Title | Difficulty | Tags |
|---|---|---|---|---|
| · | 1 | Numerically Stable Softmax | Easy | numpy, activation, numerical-stability |
| · | 2 | Sigmoid | Easy | numpy, activation |
| · | 3 | ReLU & Variants | Easy | numpy, activation |
| · | 4 | Cross-Entropy Loss | Easy | numpy, loss |
| · | 5 | MSE Loss | Easy | numpy, loss |
| · | 6 | Batch Normalization | Medium | numpy, normalization |
| · | 7 | Layer Normalization | Medium | numpy, normalization, transformer |
| · | 8 | Scaled Dot-Product Attention | Medium | numpy, transformer, attention |
| · | 9 | Adam Optimizer Step | Medium | numpy, optimizer |
| · | 10 | SGD with Momentum | Medium | numpy, optimizer |
| · | 11 | Dropout Forward | Medium | numpy, regularization |
| · | 12 | Backprop: Single Linear Layer | Medium | numpy, backpropagation |
| · | 13 | Backprop: 2-Layer MLP | Hard | numpy, backpropagation, mlp |
| · | 14 | Sinusoidal Positional Encoding | Medium | numpy, transformer, positional-encoding |
| · | 15 | Cosine Annealing LR | Easy | numpy, learning-rate, scheduler |
| · | 16 | KL Divergence | Easy | numpy, loss, information-theory |
| · | 17 | He Weight Initialization | Easy | numpy, initialization |
| · | 18 | L2 Regularization | Easy | numpy, regularization |
| · | 19 | Gradient Clipping | Easy | numpy, optimization, gradients |
| · | 20 | Multi-Head Attention | Hard | numpy, transformer, attention |
| · | 21 | Rotary Position Embedding (RoPE)🔒 | Medium | rope, positional-encoding, transformers, attention |
| · | 22 | Ring All-Reduce🔒 | Medium | all-reduce, distributed, collective, ring, data-parallelism |
| · | 23 | All-Gather🔒 | Easy | all-gather, distributed, collective, fsdp, tensor-parallelism |
| · | 24 | Data Parallelism: Gradient Averaging🔒 | Easy | data-parallelism, ddp, gradient, distributed, averaging |
| · | 25 | Tensor Parallelism (Megatron-LM)🔒 | Hard | tensor-parallelism, megatron, column-parallel, row-parallel, distributed |
| · | 26 | Full Transformer (Encoder-Decoder)🔒 | Hard | transformer, encoder-decoder, seq2seq, attention, cross-attention |
| · | 27 | Flash Attention (Tiled)🔒 | Hard | flash-attention, attention, memory-efficient, transformers, tiling |
| · | 28 | Grouped Query Attention (GQA)🔒 | Medium | gqa, attention, llama, mistral, kv-cache, efficiency |
| · | 29 | KV Cache🔒 | Medium | kv-cache, inference, attention, llm-serving |
| · | 30 | Byte-Pair Encoding (BPE)🔒 | Medium | tokenizer, bpe, nlp, vocabulary, gpt, llm |
1–30 of 183