TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
←

268. Backprop: Softmax (PyTorch)

Medium

Implement softmax (dim=-1) as a torch.autograd.Function. Forward returns the probability distribution; backward applies the Jacobian diag(y) - y y^T per row to grad_output.

The rule: you may NOT call F.softmax, torch.softmax, nn.Softmax, or F.log_softmax. Use .exp(), .sum(), .max().

Backward formula (per row, last dim): dL/dx_i = y_i * (grad_output_i - sum_j(grad_output_j * y_j)). Equivalent to y * (g - (g*y).sum(dim=-1, keepdim=True)).

The driver softmax_run(mode, x) dispatches 'forward' | 'grad_x' | 'gradcheck'. Note: for grad_x we use a non-uniform upstream gradient (weighted sum) so the result is non-zero — see starter code.

Math

yi​=∑j​exj​exi​​,∂xi​∂L​=yi​(∂yi​∂L​−j∑​yj​∂yj​∂L​)

Related problems

  • Backprop: SoftmaxmediumNumPy

Asked at

NumPy

import numpy as np

 

def softmax_run(...):

    pass

🔒

Premium problem

Free accounts include problems #1–20. Upgrade to unlock the editor, hidden test cases, and reference solutions for every problem.

Upgrade to PremiumBack to problems

Already premium?