TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
←

208. Backprop: Embedding lookup

Medium

Hand-derive the gradient of L = sum(W[indices]) w.r.t. the embedding table W.

Forward: y = W[indices] — gather rows of W according to the integer index array indices. L = sum(y).

Implement:

  • embedding_forward(W, indices) -> y of shape indices.shape + (embed_dim,)
  • embedding_backward(W, indices) -> dL/dW of shape (vocab_size, embed_dim)

The trick: indices may repeat. If token i appears k times in indices, then W[i] contributes k copies to the sum, so dL/dW[i] = k * ones(embed_dim). Indices that don't appear get zero rows.

This is also called scatter-add in autograd land — the backward of a gather is a scatter that accumulates duplicates.

Math

yn,d​=Win​,d​,∂Wv,d​∂L​=#{n:in​=v}

Related problems

  • Backprop: Embedding (PyTorch)mediumPyTorch

Asked at

NumPy

import numpy as np

 

def embedding_forward(...):

    pass

🔒

Premium problem

Free accounts include problems #1–20. Upgrade to unlock the editor, hidden test cases, and reference solutions for every problem.

Upgrade to PremiumBack to problems

Already premium?