Combine the inner product (#221) and outer product (#222) patterns to build a 2-D compute-vs-memory boundedness map across a model catalog and a hardware catalog.
Implement: def arithmetic_intensity_grid(model_flops, model_bytes, hw_intensity) where:
model_flops is shape (N,) — total FLOPs for each model.model_bytes is shape (N,) — total bytes accessed for each model (paired with model_flops).hw_intensity is shape (M,) — the break-even arithmetic intensity for each hardware target (peak FLOPS / peak memory bandwidth, in FLOPs per byte).Return shape (N, M) of int64. Entry [n, m] is:
1 if model n is compute-bound on hardware m (model intensity ≥ hw intensity).0 if model n is memory-bound on hardware m (model intensity < hw intensity).The recipe — two patterns layered:
(N,).(N,) vector against the (M,) hardware vector using broadcasting (e.g. [:, None] vs [None, :]) to land on (N, M).Why this matters: the roofline model says a kernel achieves the lower of (1) compute peak and (2) bandwidth × intensity. When intensity ≥ peak/bandwidth you can saturate the math units; below it, the kernel is starved for data. This grid tells you, for each model and each hardware target, which side of the roofline you're on.
Math
Asked at
import numpy as np
def arithmetic_intensity_grid(...):
pass
Premium problem
Free accounts include problems #1–20. Upgrade to unlock the editor, hidden test cases, and reference solutions for every problem.
Already premium?