Backprop: ReLU (PyTorch)

Implement ReLU as a custom torch.autograd.Function. Forward: y = max(0, x). Backward: pass grad_output through where x > 0, zero elsewhere.

The rule: you may NOT call F.relu, torch.relu, x.clamp, or nn.ReLU to do the forward — write x * (x > 0) (or equivalent primitive ops). The lesson is wiring backward to the saved mask.

Implement ReLUFunction with forward(ctx, x) and backward(ctx, grad_output). The driver relu_run(mode, x) (provided) dispatches:

'forward' returns ReLUFunction.apply(x).tolist()
'grad_x' returns x.grad.tolist() after .sum().backward()
'gradcheck' returns torch.autograd.gradcheck(ReLUFunction.apply, (x.double(),))

Note: at x = 0 the gradient is undefined; convention is 0. Test inputs avoid the discontinuity so gradcheck is well-defined.

Math

\frac{\partial ReLU ( x )}{\partial x} = 1_{x > 0}

265. Backprop: ReLU (PyTorch)

265. Backprop: ReLU (PyTorch)