Implement LayerNorm where the output is written into a caller-supplied buffer out, instead of allocating a fresh tensor. This is how production fused kernels work: an inference engine pre-allocates activation buffers once, and every layer writes into pre-existing memory.
Signature: def layernorm_inplace(x, gamma, beta, out, eps=1e-5) -> out
x: input, shape (..., D) (normalize over last axis)gamma: per-feature scale, shape (D,)beta: per-feature shift, shape (D,)out: pre-allocated output buffer, same shape as x — write your result hereeps: stability constant for the varianceThe function must:
(x - mean) / sqrt(var + eps) * gamma + beta, where mean and var are taken over the last axis.out (e.g. out[...] = ...).out.Constraints:
np.empty_like(x), np.zeros_like(x), etc.). Just normalize and assign into the buffer the caller passed you.O(B), not O(B*D).The harness verifies the returned array equals the LayerNorm of x. One test passes in a pre-zeroed buffer to confirm you actually wrote into it (rather than returning a fresh allocation).
Math
Asked at
import numpy as np
def layernorm_inplace(...):
pass
Premium problem
Free accounts include problems #1–20. Upgrade to unlock the editor, hidden test cases, and reference solutions for every problem.
Already premium?