Temperature Scaling + Repetition Penalty

Two essential sampling controls for LLM text generation:

Temperature T: divide logits by T before softmax. T < 1 sharpens the distribution (more deterministic), T > 1 flattens it (more random), T = 1 leaves it unchanged.
Repetition penalty θ: for every token that already appeared in context, divide its logit by θ if the logit is positive, or multiply by θ if negative. This discourages repeating the same tokens.

Signature: def apply_temperature_and_penalty(logits, temperature, past_token_ids, repetition_penalty=1.0)

logits: (vocab_size,)
temperature: float > 0
past_token_ids: list of int — tokens already generated (may contain duplicates)
repetition_penalty: float ≥ 1.0 (1.0 = no penalty)
Returns: (vocab_size,) — probability distribution after penalty + temperature + softmax

Math

Asked at

Two essential sampling controls for LLM text generation:

Temperature T: divide logits by T before softmax. T < 1 sharpens the distribution (more deterministic), T > 1 flattens it (more random), T = 1 leaves it unchanged.
Repetition penalty θ: for every token that already appeared in context, divide its logit by θ if the logit is positive, or multiply by θ if negative. This discourages repeating the same tokens.

Signature: def apply_temperature_and_penalty(logits, temperature, past_token_ids, repetition_penalty=1.0)

logits: (vocab_size,)
temperature: float > 0
past_token_ids: list of int — tokens already generated (may contain duplicates)
repetition_penalty: float ≥ 1.0 (1.0 = no penalty)
Returns: (vocab_size,) — probability distribution after penalty + temperature + softmax

Math

Asked at