Implement scaled dot-product attention.
Signature: def scaled_dot_product_attention(Q: np.ndarray, K: np.ndarray, V: np.ndarray) -> np.ndarray
def scaled_dot_product_attention(Q: np.ndarray, K: np.ndarray, V: np.ndarray) -> np.ndarray
Compute softmax(QK^T / sqrt(d_k)) @ V where d_k is the key dimension.
softmax(QK^T / sqrt(d_k)) @ V
Math
Asked at
Test Results