TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
←

200. Backprop: Linear (matmul + bias)

Easy

Hand-derive the gradient of a linear layer's output (summed to a scalar) with respect to its input x.

Forward: y = x @ W.T + b, then L = sum(y)

Implement two functions:

  • linear_forward(x, W, b) -> y — produces y of shape (batch, out_features)
  • linear_backward(x, W, b) -> dL_dx of shape (batch, in_features) — gradient of L = sum(y) w.r.t. x

The harness verifies your analytic linear_backward by central-differencing your linear_forward at every position of x. If they match within tolerance, you've gotten the chain rule right.

Hint: When L = sum(y) and y = x @ W.T + b, dL/dy = ones, so dL/dx = ones @ W = W.sum(axis=0) broadcast to every batch row.

Math

y=xW⊤+b,L=i∑​yi​,∂x∂L​=1⋅W

Related problems

  • Backprop: Linear (PyTorch)mediumPyTorch

Asked at

NumPy

import numpy as np

 

def linear_forward(...):

    pass

🔒

Premium problem

Free accounts include problems #1–20. Upgrade to unlock the editor, hidden test cases, and reference solutions for every problem.

Upgrade to PremiumBack to problems

Already premium?