Backprop: Linear (PyTorch)

Re-implement the linear layer's forward and backward as a custom torch.autograd.Function. Forward is y = x @ W.T + b. Backward returns (dL/dx, dL/dW, dL/db) from the upstream grad_output.

The rule: you may NOT call F.linear, torch.nn.Linear, or autograd-on-existing-ops to do the work for you. The point is to wire the chain rule by hand inside backward.

Implement:

class LinearFunction(torch.autograd.Function):
    @staticmethod
    def forward(ctx, x, W, b): ...
    @staticmethod
    def backward(ctx, grad_output): ...

Plus a driver linear_run(mode, x, W, b) (provided in starter) that the harness calls. mode is one of:

'forward' — return LinearFunction.apply(x, W, b).tolist()
'grad_x' — apply, .sum().backward(), return x.grad.tolist()
'grad_W' — same, return W.grad.tolist()
'grad_b' — same, return b.grad.tolist()
'gradcheck' — run torch.autograd.gradcheck(LinearFunction.apply, (x.double(), W.double(), b.double())) and return the bool

Why autograd.Function? Custom CUDA kernels and fused ops need explicit forward/backward. Autograd doesn't see through your kernel, so you must supply the chain rule yourself. gradcheck is your safety net.

Math

y = x W^{⊤} + b, \frac{\partial L}{\partial x} = (\partial L / \partial y) W, \frac{\partial L}{\partial W} = (\partial L / \partial y)^{⊤} x, \frac{\partial L}{\partial b} = b \sum (\partial L / \partial y)

Related problems

Backprop: Linear (matmul + bias)easyNumPy

Asked at