Debug Decoder I (PyTorch)

The mini-decoder forward pass below was working at one point, but a refactor broke something. On the very first forward pass the function blows up — either an index error from the embedding layer, or downstream a shape-mismatch when the position vectors are added to the token vectors.

Find and fix the bug(s) so the function correctly maps token ids to logits.

Signature: def decoder_forward(token_ids, tok_emb_w, pos_emb_w, W_out, b_out)

token_ids: list of length T, integer token ids in [0, vocab_size)
tok_emb_w: token-embedding weight, nested list of shape (vocab_size, d_model)
pos_emb_w: positional-embedding weight, nested list of shape (max_seq_len, d_model)
W_out: output projection, nested list of shape (d_model, vocab_size)
b_out: output bias, list of length vocab_size

The model uses vocab_size = 8, max_seq_len = 4, d_model = 8. The pipeline is:

Build nn.Embedding layers for tokens and positions, copy in the provided weights
Look up token + position embeddings and add them
Apply nn.LayerNorm across the model dimension
Project to vocabulary logits via a linear map

Return the logits as a nested list of shape (T, vocab_size).

Math

h_{t} = LN (E_{t o k} [x_{t}] + E_{p os} [t]), ℓ_{t} = h_{t} W_{o u t} + b_{o u t}

104. Debug Decoder I (PyTorch)

104. Debug Decoder I (PyTorch)