Top-k Sampling — TorchedUp

Top-k sampling restricts token selection to the k most probable tokens, then samples from their renormalized distribution. This prevents sampling from very unlikely tokens while maintaining diversity (unlike greedy decoding).

Steps:

Apply softmax to logits
Keep only top-k probabilities, zero out the rest
Renormalize to sum to 1
Return the masked probability distribution

Signature: def top_k_sampling(logits, k)

logits: (vocab_size,) — unnormalized scores
k: int — number of top tokens to keep
Returns: (vocab_size,) — renormalized probability distribution (zeros for non-top-k)

Math

p_{i} = \frac{exp ( z _{i} )}{\sum _{j} exp ( z _{j} )} \overset{p}{^}_{i} = {p_{i} / \sum_{j \in top- k} p_{j} 0 if i \in top- k otherwise

Asked at

Steps:

Apply softmax to logits
Keep only top-k probabilities, zero out the rest
Renormalize to sum to 1
Return the masked probability distribution

Signature: def top_k_sampling(logits, k)

logits: (vocab_size,) — unnormalized scores
k: int — number of top tokens to keep
Returns: (vocab_size,) — renormalized probability distribution (zeros for non-top-k)

Math

p_{i} = \frac{exp ( z _{i} )}{\sum _{j} exp ( z _{j} )} \overset{p}{^}_{i} = {p_{i} / \sum_{j \in top- k} p_{j} 0 if i \in top- k otherwise

Asked at

32. Top-k Sampling

32. Top-k Sampling