TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
←

82. Temperature Scaling + Repetition Penalty

Easy

Two essential sampling controls for LLM text generation:

  • Temperature T: divide logits by T before softmax. T < 1 sharpens the distribution (more deterministic), T > 1 flattens it (more random), T = 1 leaves it unchanged.
  • Repetition penalty θ: for every token that already appeared in context, divide its logit by θ if the logit is positive, or multiply by θ if negative. This discourages repeating the same tokens.

Signature: def apply_temperature_and_penalty(logits, temperature, past_token_ids, repetition_penalty=1.0)

  • logits: (vocab_size,)
  • temperature: float > 0
  • past_token_ids: list of int — tokens already generated (may contain duplicates)
  • repetition_penalty: float ≥ 1.0 (1.0 = no penalty)
  • Returns: (vocab_size,) — probability distribution after penalty + temperature + softmax

Math

ℓi′​={ℓi​/θℓi​⋅θ​ℓi​>0ℓi​≤0​,pi​=∑j​eℓj′​/Teℓi′​/T​

Asked at

NumPy

import numpy as np

 

def apply_temperature_and_penalty(...):

    pass

🔒

Premium problem

Free accounts include problems #1–20. Upgrade to unlock the editor, hidden test cases, and reference solutions for every problem.

Upgrade to PremiumBack to problems

Already premium?