Compute MFU — the fraction of peak hardware FLOPs your training run is actually using.
Signature: def mfu(achieved_tokens_per_sec: float, n_params: int, peak_flops: float) -> float
Using the 6N rule for FLOPs per token:
MFU = (6 * n_params * tokens_per_sec) / peak_flops
Return a float in [0, 1].
Math
Asked at
Test Results