TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
←

264. Backprop: Linear (PyTorch)

Medium

Re-implement the linear layer's forward and backward as a custom torch.autograd.Function. Forward is y = x @ W.T + b. Backward returns (dL/dx, dL/dW, dL/db) from the upstream grad_output.

The rule: you may NOT call F.linear, torch.nn.Linear, or autograd-on-existing-ops to do the work for you. The point is to wire the chain rule by hand inside backward.

Implement:

class LinearFunction(torch.autograd.Function):
    @staticmethod
    def forward(ctx, x, W, b): ...
    @staticmethod
    def backward(ctx, grad_output): ...

Plus a driver linear_run(mode, x, W, b) (provided in starter) that the harness calls. mode is one of:

  • 'forward' — return LinearFunction.apply(x, W, b).tolist()
  • 'grad_x' — apply, .sum().backward(), return x.grad.tolist()
  • 'grad_W' — same, return W.grad.tolist()
  • 'grad_b' — same, return b.grad.tolist()
  • 'gradcheck' — run torch.autograd.gradcheck(LinearFunction.apply, (x.double(), W.double(), b.double())) and return the bool

Why autograd.Function? Custom CUDA kernels and fused ops need explicit forward/backward. Autograd doesn't see through your kernel, so you must supply the chain rule yourself. gradcheck is your safety net.

Math

y=xW⊤+b,∂x∂L​=(∂L/∂y)W,∂W∂L​=(∂L/∂y)⊤x,∂b∂L​=b∑​(∂L/∂y)

Related problems

  • Backprop: Linear (matmul + bias)easyNumPy

Asked at

NumPy

import numpy as np

 

def linear_run(...):

    pass

🔒

Premium problem

Free accounts include problems #1–20. Upgrade to unlock the editor, hidden test cases, and reference solutions for every problem.

Upgrade to PremiumBack to problems

Already premium?