Model FLOPs Utilization (MFU)

Model FLOPs Utilization

Compute MFU — the fraction of peak hardware FLOPs your training run is actually using.

Signature: def mfu(achieved_tokens_per_sec: float, n_params: int, peak_flops: float) -> float

Using the 6N rule for FLOPs per token:

MFU = (6 * n_params * tokens_per_sec) / peak_flops

Return a float in [0, 1].

Math

Asked at

Python (numpy)0/3 runs today

Test Results

○half utilization

○low MFU

○high MFU🔒 Premium