TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
←

59. Debug: Attention Scale

Easy

The function below is supposed to compute scaled dot-product attention, but it has a bug. Find and fix it.

Signature: def buggy_attention(Q, K, V)

  • Q: (N, d_k) — query matrix
  • K: (N, d_k) — key matrix
  • V: (N, d_v) — value matrix
  • Returns: (N, d_v)

Hint: Compare the implementation to the standard scaled dot-product attention formula and check the normalization step before the softmax.

Math

scaled dot-product attention: Attention(Q,K,V)=softmax(dk​​QKT​)VFor unit-normal q,k∈Rdk​:E[q⋅k]=0,Var[q⋅k]=dk​

Asked at

NumPy

import numpy as np

 

def buggy_attention(...):

    pass

🔒

Premium problem

Free accounts include problems #1–20. Upgrade to unlock the editor, hidden test cases, and reference solutions for every problem.

Upgrade to PremiumBack to problems

Already premium?