Arithmetic Intensity Grid

Combine the inner product (#221) and outer product (#222) patterns to build a 2-D compute-vs-memory boundedness map across a model catalog and a hardware catalog.

Implement: def arithmetic_intensity_grid(model_flops, model_bytes, hw_intensity) where:

model_flops is shape (N,) — total FLOPs for each model.
model_bytes is shape (N,) — total bytes accessed for each model (paired with model_flops).
hw_intensity is shape (M,) — the break-even arithmetic intensity for each hardware target (peak FLOPS / peak memory bandwidth, in FLOPs per byte).

Return shape (N, M) of int64. Entry [n, m] is:

1 if model n is compute-bound on hardware m (model intensity ≥ hw intensity).
0 if model n is memory-bound on hardware m (model intensity < hw intensity).

The recipe — two patterns layered:

Inner-product / paired sweep: compute each model's intensity as FLOPs over bytes — a vector of shape (N,).
Outer-product / grid: compare that (N,) vector against the (M,) hardware vector using broadcasting (e.g. [:, None] vs [None, :]) to land on (N, M).

Why this matters: the roofline model says a kernel achieves the lower of (1) compute peak and (2) bandwidth × intensity. When intensity ≥ peak/bandwidth you can saturate the math units; below it, the kernel is starved for data. This grid tells you, for each model and each hardware target, which side of the roofline you're on.

Math

I_{n} = \frac{F _{n}}{B _{n}}, G_{n, m} = 1 [I_{n} \geq H_{m}]

Asked at