TorchedUp
ProblemsPremium
TorchedUp
Temperature Scaling + Repetition PenaltyEasy
ProblemsPremium

Temperature Scaling + Repetition Penalty

Two essential sampling controls for LLM text generation:

  • Temperature T: divide logits by T before softmax. T < 1 sharpens the distribution (more deterministic), T > 1 flattens it (more random), T = 1 leaves it unchanged.
  • Repetition penalty θ: for every token that already appeared in context, divide its logit by θ if the logit is positive, or multiply by θ if negative. This discourages repeating the same tokens.

Signature: def apply_temperature_and_penalty(logits, temperature, past_token_ids, repetition_penalty=1.0)

  • logits: (vocab_size,)
  • temperature: float > 0
  • past_token_ids: list of int — tokens already generated (may contain duplicates)
  • repetition_penalty: float ≥ 1.0 (1.0 = no penalty)
  • Returns: (vocab_size,) — probability distribution after penalty + temperature + softmax

Math

Asked at

Python (numpy)0/3 runs today

Test Results

○T=1, no penalty — plain softmax
○T=0.5 — sharpens distribution toward argmax
○T=1, penalty=1.5 on token 0 (positive logit penalized)
○T=2.0, penalty=1.2 on tokens 1 (positive) and 3 (negative)
○T=1, all tokens penalized (stress test)🔒 Premium
○output sums to 1 (valid probability distribution)
○output is non-negative
○preserves argmax with no penalty (T>0)
Advertisement