Hand-derive the gradient of cross-entropy loss applied to softmax outputs, with respect to the logits x.
Forward: Compute y = softmax(logits) along the last axis, then per-sample losses l_b = -sum_c target_{b,c} * log(y_{b,c}). Return the vector of per-sample losses (shape (batch,)); the harness then sums to a scalar.
Implement:
softmax_ce_forward(logits, target) -> losses of shape (batch,)softmax_ce_backward(logits, target) -> dL/dlogits of shape (batch, num_classes) where L = sum(losses)target is a one-hot matrix of the same shape as logits. The famous result: dL/dlogits = y - target (no batch division because we sum, not mean).
Math
Related problems
Asked at
import numpy as np
def softmax_ce_forward(...):
pass
Premium problem
Free accounts include problems #1–20. Upgrade to unlock the editor, hidden test cases, and reference solutions for every problem.
Already premium?