201. Backprop: ReLU

Easy

Hand-derive the gradient of L = sum(ReLU(x)) w.r.t. x.

Forward: y = max(0, x) element-wise. L = sum(y).

Implement:

relu_forward(x) -> y
relu_backward(x) -> dL/dx of the same shape as x

The harness verifies your analytic backward against central-differences of the forward. Note: the gradient at x = 0 is technically undefined; convention is to treat it as 0.

Math

\frac{\partial ReLU ( x )}{\partial x} = 1_{x > 0}