Top-k Sampling

Top-k sampling restricts token selection to the k most probable tokens, then samples from their renormalized distribution. This prevents sampling from very unlikely tokens while maintaining diversity (unlike greedy decoding).

Steps:

Apply softmax to logits
Keep only top-k probabilities, zero out the rest
Renormalize to sum to 1
Return the masked probability distribution

Signature: def top_k_sampling(logits, k)

logits: (vocab_size,) — unnormalized scores
k: int — number of top tokens to keep
Returns: (vocab_size,) — renormalized probability distribution (zeros for non-top-k)

Math

Asked at

Python (numpy)0/3 runs today

Test Results

○k=2 keeps only top 2 tokens

○tied logits — all tokens equal, k=2 keeps all due to ties

○k=vocab_size equals plain softmax🔒 Premium

○output is a probability distribution: sums to 1

○output is non-negative

○preserves argmax (top-k always keeps the max-logit token)