The function below is supposed to compute scaled dot-product attention, but it has a bug. Find and fix it.
Signature: def buggy_attention(Q, K, V)
Q: (N, d_k) — query matrixK: (N, d_k) — key matrixV: (N, d_v) — value matrixHint: Compare the implementation to the standard scaled dot-product attention formula and check the normalization step before the softmax.
Math
Asked at
import numpy as np
def buggy_attention(...):
pass
Premium problem
Free accounts include problems #1–20. Upgrade to unlock the editor, hidden test cases, and reference solutions for every problem.
Already premium?