Backprop: Embedding (PyTorch)

Implement a token embedding lookup as a torch.autograd.Function. Forward indexes W by integer ids. Backward scatters incoming gradients back to those rows, summing where the same id appears multiple times.

The rule: you may NOT call F.embedding or nn.Embedding. Use indexing for forward and index_add_ (or equivalent scatter) for backward.

Forward: y = W[idx] where W: (V, D) and idx is integer-typed of arbitrary shape S. Output shape: S + (D,).

Backward: grad_W[v] = sum over occurrences of v in idx of grad_output[at_that_position]. Other rows are zero.

The driver emb_run(mode, W, idx) dispatches 'forward' | 'grad_W' | 'gradcheck'.

Math

y_{s, *} = W_{i d x_{s}, *}, \frac{\partial L}{\partial W _{v}} = s : i d x_{s} = v \sum \frac{\partial L}{\partial y _{s}}

272. Backprop: Embedding (PyTorch)

272. Backprop: Embedding (PyTorch)