TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
←

216. LayerNorm with Pre-allocated Output Buffer

Medium

Implement LayerNorm where the output is written into a caller-supplied buffer out, instead of allocating a fresh tensor. This is how production fused kernels work: an inference engine pre-allocates activation buffers once, and every layer writes into pre-existing memory.

Signature: def layernorm_inplace(x, gamma, beta, out, eps=1e-5) -> out

  • x: input, shape (..., D) (normalize over last axis)
  • gamma: per-feature scale, shape (D,)
  • beta: per-feature shift, shape (D,)
  • out: pre-allocated output buffer, same shape as x — write your result here
  • eps: stability constant for the variance

The function must:

  1. Compute LayerNorm: (x - mean) / sqrt(var + eps) * gamma + beta, where mean and var are taken over the last axis.
  2. Write the result into out (e.g. out[...] = ...).
  3. Return out.

Constraints:

  • Do NOT allocate a new output array (np.empty_like(x), np.zeros_like(x), etc.). Just normalize and assign into the buffer the caller passed you.
  • Intermediate scalars/vectors (mean, var) are fine — they're O(B), not O(B*D).

The harness verifies the returned array equals the LayerNorm of x. One test passes in a pre-zeroed buffer to confirm you actually wrote into it (rather than returning a fresh allocation).

Math

out[i]=σi2​+ϵ​xi​−μi​​γ+β

Asked at

NumPy

import numpy as np

 

def layernorm_inplace(...):

    pass

🔒

Premium problem

Free accounts include problems #1–20. Upgrade to unlock the editor, hidden test cases, and reference solutions for every problem.

Upgrade to PremiumBack to problems

Already premium?