In QLoRA, base weights are stored as 4-bit NF4 indices. To use them in a forward pass we must dequantize: look up the float value in the NF4 codebook and multiply by a per-block scale.
Signature: def nf4_dequantize(quantized: np.ndarray, scale: float, codebook: np.ndarray) -> np.ndarray
quantized: integer indices in [0, 16) (the 4-bit codes)scale: per-block float scalecodebook: the 16-entry NF4 lookup tableReturns: codebook[quantized] * scale — float array same shape as quantized.
Math
Asked at
Test Results