Implement multi-head attention (without learned projection matrices).
Signature: def multi_head_attention(Q: np.ndarray, K: np.ndarray, V: np.ndarray, num_heads: int) -> np.ndarray
num_heads headsAssume d_model % num_heads == 0.
Math
Asked at
Test Results