Re-implement the linear layer's forward and backward as a custom torch.autograd.Function. Forward is y = x @ W.T + b. Backward returns (dL/dx, dL/dW, dL/db) from the upstream grad_output.
The rule: you may NOT call F.linear, torch.nn.Linear, or autograd-on-existing-ops to do the work for you. The point is to wire the chain rule by hand inside backward.
Implement:
class LinearFunction(torch.autograd.Function):
@staticmethod
def forward(ctx, x, W, b): ...
@staticmethod
def backward(ctx, grad_output): ...
Plus a driver linear_run(mode, x, W, b) (provided in starter) that the harness calls. mode is one of:
'forward' — return LinearFunction.apply(x, W, b).tolist()'grad_x' — apply, .sum().backward(), return x.grad.tolist()'grad_W' — same, return W.grad.tolist()'grad_b' — same, return b.grad.tolist()'gradcheck' — run torch.autograd.gradcheck(LinearFunction.apply, (x.double(), W.double(), b.double())) and return the boolWhy autograd.Function? Custom CUDA kernels and fused ops need explicit forward/backward. Autograd doesn't see through your kernel, so you must supply the chain rule yourself. gradcheck is your safety net.
Math
Related problems
Asked at
import numpy as np
def linear_run(...):
pass
Premium problem
Free accounts include problems #1–20. Upgrade to unlock the editor, hidden test cases, and reference solutions for every problem.
Already premium?