Layer Normalization (PyTorch)

Implement the layer normalization forward pass in PyTorch using primitive tensor ops only.

Signature: def layer_norm(x: torch.Tensor, gamma: torch.Tensor, beta: torch.Tensor, eps: float = 1e-5) -> torch.Tensor

The rule: you may NOT call nn.LayerNorm, F.layer_norm, or any built-in normalization layer. Roll the math yourself with .mean(), .var(), and friends.

Normalize over the last (feature) axis for each sample independently, then apply the learnable affine gamma * x_hat + beta.

PyTorch idioms vs NumPy:

keepdim=True (no 's'). NumPy uses keepdims=True. Wrong spelling silently falls through to keepdim=False and your shapes won't broadcast.
dim=-1 not axis=-1.
x.var(dim=-1, keepdim=True, unbiased=False) matches NumPy's np.var (population variance, /N). Default unbiased=True divides by N-1 and gives slightly different numbers — match LayerNorm's standard convention with unbiased=False.

Math

LN (x) = \frac{x - μ}{σ ^{2} + ϵ} \cdot γ + β

248. Layer Normalization (PyTorch)

248. Layer Normalization (PyTorch)