TorchedUp
ProblemsPremium
TorchedUp
Scaled Dot-Product AttentionMedium
ProblemsPremium

Scaled Dot-Product Attention

Implement scaled dot-product attention.

Signature: def scaled_dot_product_attention(Q: np.ndarray, K: np.ndarray, V: np.ndarray) -> np.ndarray

Compute softmax(QK^T / sqrt(d_k)) @ V where d_k is the key dimension.

Math

Asked at

Python (numpy)0/3 runs today

Test Results

○identity QKV
○uniform keys
○peaked scores🔒 Premium
○output non-negative when V is non-negative
○single-query attention: output row sums to 1 when V=I
○shift-invariant on Q when keys are uniform across rows
Advertisement