Implement the backward pass for a single linear layer y = xW + b using the chain rule.
Signature: def backprop_single_layer(x: np.ndarray, W: np.ndarray, delta: np.ndarray) -> tuple
Inputs:
x: input activations of shape (batch, in_features) (or any (..., in_features) for higher-rank inputs)W: weight matrix of shape (in_features, out_features)delta: upstream gradient dL/dy of shape matching the layer's output (..., out_features)Returns the tuple (dW, db, dx) of gradients, with shapes:
dW: same shape as W — (in_features, out_features)db: same shape as the bias — (out_features,)dx: same shape as xThe bias b itself is not passed in (its gradient depends only on delta). Your implementation should handle both the standard 2D batch (B, in) and higher-rank inputs like (B, T, in) — the test suite includes a 3D case.
Math
Asked at
Output
Anything you print() in your code will show up here after you click Run.
Test Results