Hand-derive the gradient of a linear layer's output (summed to a scalar) with respect to its input x.
Forward: y = x @ W.T + b, then L = sum(y)
Implement two functions:
linear_forward(x, W, b) -> y — produces y of shape (batch, out_features)linear_backward(x, W, b) -> dL_dx of shape (batch, in_features) — gradient of L = sum(y) w.r.t. xThe harness verifies your analytic linear_backward by central-differencing your linear_forward at every position of x. If they match within tolerance, you've gotten the chain rule right.
Hint: When L = sum(y) and y = x @ W.T + b, dL/dy = ones, so dL/dx = ones @ W = W.sum(axis=0) broadcast to every batch row.
Math
Related problems
Asked at
import numpy as np
def linear_forward(...):
pass
Premium problem
Free accounts include problems #1–20. Upgrade to unlock the editor, hidden test cases, and reference solutions for every problem.
Already premium?