TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
←

205. Backprop: Softmax + Cross-Entropy (fused)

Medium

Hand-derive the gradient of cross-entropy loss applied to softmax outputs, with respect to the logits x.

Forward: Compute y = softmax(logits) along the last axis, then per-sample losses l_b = -sum_c target_{b,c} * log(y_{b,c}). Return the vector of per-sample losses (shape (batch,)); the harness then sums to a scalar.

Implement:

  • softmax_ce_forward(logits, target) -> losses of shape (batch,)
  • softmax_ce_backward(logits, target) -> dL/dlogits of shape (batch, num_classes) where L = sum(losses)

target is a one-hot matrix of the same shape as logits. The famous result: dL/dlogits = y - target (no batch division because we sum, not mean).

Math

y=softmax(x),L=−b∑​c∑​tbc​logybc​,∂xbc​∂L​=ybc​−tbc​

Related problems

  • Backprop: Softmax + Cross-Entropy fused (PyTorch)mediumPyTorch

Asked at

NumPy

import numpy as np

 

def softmax_ce_forward(...):

    pass

🔒

Premium problem

Free accounts include problems #1–20. Upgrade to unlock the editor, hidden test cases, and reference solutions for every problem.

Upgrade to PremiumBack to problems

Already premium?