TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
←

12. Backprop: Single Linear Layer

Medium

Implement the backward pass for a single linear layer y = xW + b using the chain rule.

Signature: def backprop_single_layer(x: np.ndarray, W: np.ndarray, delta: np.ndarray) -> tuple

Inputs:

  • x: input activations of shape (batch, in_features) (or any (..., in_features) for higher-rank inputs)
  • W: weight matrix of shape (in_features, out_features)
  • delta: upstream gradient dL/dy of shape matching the layer's output (..., out_features)

Returns the tuple (dW, db, dx) of gradients, with shapes:

  • dW: same shape as W — (in_features, out_features)
  • db: same shape as the bias — (out_features,)
  • dx: same shape as x

The bias b itself is not passed in (its gradient depends only on delta). Your implementation should handle both the standard 2D batch (B, in) and higher-rank inputs like (B, T, in) — the test suite includes a 3D case.

Math

y=xW+b,δ=∂y∂L​,∂W∂L​, ∂b∂L​, ∂x∂L​=?

Asked at

Python 30/10 runs today

Output

Anything you print() in your code will show up here after you click Run.

Test Results

○simple 1x2 -> 1x3
○batch size 2
○zero gradient🔒 Premium
○gradient matches central-difference numerical estimate
○batched 3D input (B=2, T=3, in=2 -> out=3)🔒 Premium