Implement scaled dot-product attention.
Signature: def scaled_dot_product_attention(Q: np.ndarray, K: np.ndarray, V: np.ndarray) -> np.ndarray
Each query attends to all keys with similarity scaled by sqrt(d_k), the result is softmax-normalized across keys, then used to take a weighted sum over the values. d_k is the key dimension. See the math reference below.
Math
Related problems
Asked at
Output
Anything you print() in your code will show up here after you click Run.
Test Results