Implement Rotary Position Embedding (RoPE), used in LLaMA, GPT-NeoX, and most modern LLMs. RoPE encodes position information by rotating query/key vectors by position-dependent angles — unlike additive sinusoidal embeddings, RoPE directly modifies the dot product between queries and keys.
Signature: def rope(x: np.ndarray, positions: np.ndarray) -> np.ndarray
x: shape (seq_len, d_model) — query or key vectorspositions: shape (seq_len,) — integer position indices(seq_len, d_model) — rotated vectorsFor a d-dimensional vector x at position m, split into pairs (x_{2i}, x_{2i+1}). The rotation angle for dimension pair i is θ_i = 1 / 10000^(2i/d). Apply:
x_rot[2i] = x[2i] * cos(m * θ_i) - x[2i+1] * sin(m * θ_i)
x_rot[2i+1] = x[2i] * sin(m * θ_i) + x[2i+1] * cos(m * θ_i)
Math
Asked at
Test Results