TorchedUp
ProblemsPremium
TorchedUp
ProblemsPremium

Problems

183 coding challenges across numpy, PyTorch, transformers, and distributed systems.

✓#TitleDifficultyAcceptanceTags
·1
Numerically Stable Softmax
Easy—numpy, activation, numerical-stability
·2
Sigmoid
Easy—numpy, activation
·3
ReLU & Variants
Easy—numpy, activation
·4
Cross-Entropy Loss
Easy—numpy, loss
·5
MSE Loss
Easy—numpy, loss
·6
Batch Normalization
Medium—numpy, normalization
·7
Layer Normalization
Medium—numpy, normalization, transformer
·8
Scaled Dot-Product Attention
Medium—numpy, transformer, attention
·9
Adam Optimizer Step
Medium—numpy, optimizer
·10
SGD with Momentum
Medium—numpy, optimizer
·11
Dropout Forward
Medium—numpy, regularization
·12
Backprop: Single Linear Layer
Medium—numpy, backpropagation
·13
Backprop: 2-Layer MLP
Hard—numpy, backpropagation, mlp
·14
Sinusoidal Positional Encoding
Medium—numpy, transformer, positional-encoding
·15
Cosine Annealing LR
Easy—numpy, learning-rate, scheduler
·16
KL Divergence
Easy—numpy, loss, information-theory
·17
He Weight Initialization
Easy—numpy, initialization
·18
L2 Regularization
Easy—numpy, regularization
·19
Gradient Clipping
Easy—numpy, optimization, gradients
·20
Multi-Head Attention
Hard—numpy, transformer, attention
·21
Rotary Position Embedding (RoPE)🔒
Medium—rope, positional-encoding, transformers, attention
·22
Ring All-Reduce🔒
Medium—all-reduce, distributed, collective, ring, data-parallelism
·23
All-Gather🔒
Easy—all-gather, distributed, collective, fsdp, tensor-parallelism
·24
Data Parallelism: Gradient Averaging🔒
Easy—data-parallelism, ddp, gradient, distributed, averaging
·25
Tensor Parallelism (Megatron-LM)🔒
Hard—tensor-parallelism, megatron, column-parallel, row-parallel, distributed
·26
Full Transformer (Encoder-Decoder)🔒
Hard—transformer, encoder-decoder, seq2seq, attention, cross-attention
·27
Flash Attention (Tiled)🔒
Hard—flash-attention, attention, memory-efficient, transformers, tiling
·28
Grouped Query Attention (GQA)🔒
Medium—gqa, attention, llama, mistral, kv-cache, efficiency
·29
KV Cache🔒
Medium—kv-cache, inference, attention, llm-serving
·30
Byte-Pair Encoding (BPE)🔒
Medium—tokenizer, bpe, nlp, vocabulary, gpt, llm

1–30 of 183

…

© 2026 TorchedUp. All rights reserved.

ChangelogContact UsTerms of ServicePrivacy PolicyRefund Policy