TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
←

251. Pre-LayerNorm Residual Block (PyTorch)

Easy

Implement the Pre-LN residual block in PyTorch using primitive tensor ops only:

output = x + sublayer(LayerNorm(x))

The sublayer is a linear projection x_norm @ W.T + b (standing in for attention or FFN output).

Signature: def pre_layernorm_block(x, W, b, gamma, beta) -> torch.Tensor

  • x: (..., d)
  • W: (d, d), b: (d,)
  • gamma, beta: (d,)
  • LayerNorm eps = 1e-5

The rule: you may NOT call nn.LayerNorm or F.layer_norm. Build LN from .mean() / .var().

PyTorch idioms vs NumPy:

  • x.var(dim=-1, keepdim=True, unbiased=False) — population variance to match LN convention. Default unbiased=True is wrong here.
  • x_norm @ W.T + b — matmul broadcasts naturally over leading dims, so this works for 1D, 2D, and 3D inputs without reshaping.
  • The residual x + sublayer(...) is not normalized again — that's the defining property of Pre-LN vs Post-LN, and it's why GPT-style models train stably without warmup.

Math

output=x+sublayer(LayerNorm(x))

Related problems

  • Pre-LayerNorm Residual BlockeasyNumPy

Asked at

NumPy

import numpy as np

 

def pre_layernorm_block(...):

    pass

🔒

Premium problem

Free accounts include problems #1–20. Upgrade to unlock the editor, hidden test cases, and reference solutions for every problem.

Upgrade to PremiumBack to problems

Already premium?