TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
←

74. Pre-LayerNorm Residual Block

Easy

Modern transformers (GPT-2 onwards) use Pre-LN: normalize the input before the sublayer, then add the residual. This is more training-stable than the original Post-LN.

  • Pre-LN: output = x + sublayer(LayerNorm(x))
  • Post-LN: output = LayerNorm(x + sublayer(x))

Implement the Pre-LN wrapper where the sublayer is a linear projection W @ x + b (simulating attention or FFN output).

Signature: def pre_layernorm_block(x, W, b, gamma, beta)

  • x: (d,) — input
  • W: (d, d), b: (d,) — sublayer weights
  • gamma, beta: (d,) — LayerNorm parameters
  • Returns: (d,)

LayerNorm epsilon: 1e-5.

Math

output=x+sublayer(LayerNorm(x))

Related problems

  • Pre-LayerNorm Residual Block (PyTorch)easyPyTorch

Asked at

NumPy

import numpy as np

 

def pre_layernorm_block(...):

    pass

🔒

Premium problem

Free accounts include problems #1–20. Upgrade to unlock the editor, hidden test cases, and reference solutions for every problem.

Upgrade to PremiumBack to problems

Already premium?