Classify a kernel as compute-bound or memory-bound using the roofline model.
Signature: def roofline_classify(flops_per_byte: float, machine_balance: float) -> str
Where machine_balance = peak_flops / peak_bandwidth is the FLOPs-per-byte the hardware can sustain.
Return 'compute_bound' if flops_per_byte >= machine_balance, else 'memory_bound'.
Math
Asked at
Test Results