TorchedUp
ProblemsPremium
TorchedUp
Debug: Attention ScaleEasy
ProblemsPremium

Debug: Attention Scale

The function below is supposed to compute scaled dot-product attention, but it has a bug. Find and fix it.

Signature: def buggy_attention(Q, K, V)

  • Q: (N, d_k) — query matrix
  • K: (N, d_k) — key matrix
  • V: (N, d_v) — value matrix
  • Returns: (N, d_v)

Hint: The scaling factor is critical for numerical stability and proper gradient flow. Look carefully at how the attention scores are divided before the softmax.

Math

Asked at

Python (numpy)0/3 runs today

Test Results

○random Q, K, V (seed 42, d_k=8)
○identity Q, K, V (d_k=2, bug is most visible)
○random Q, K, V (seed 7, d_k=16)🔒 Premium
Advertisement