Given the LoRA branch y_lora = (alpha/r) * x @ A.T @ B.T, compute the gradients of the loss w.r.t. A and B (the only trainable matrices — W is frozen).
Signature: def lora_backward(x: np.ndarray, A: np.ndarray, B: np.ndarray, dL_dy: np.ndarray, alpha: float, r: int) -> tuple
Shapes:
x: (batch, in)A: (r, in)B: (out, r)dL_dy: (batch, out) — upstream gradient w.r.t. the LoRA outputReturns: (dA, dB) with shapes (r, in) and (out, r).
Hint: let h = x @ A.T (shape (batch, r)). Then
dB = (alpha/r) * dL_dy.T @ hdA = (alpha/r) * (dL_dy @ B).T @ xMath
Asked at
Test Results