TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
←

231. Arithmetic Intensity Grid

Hard

Combine the inner product (#221) and outer product (#222) patterns to build a 2-D compute-vs-memory boundedness map across a model catalog and a hardware catalog.

Implement: def arithmetic_intensity_grid(model_flops, model_bytes, hw_intensity) where:

  • model_flops is shape (N,) — total FLOPs for each model.
  • model_bytes is shape (N,) — total bytes accessed for each model (paired with model_flops).
  • hw_intensity is shape (M,) — the break-even arithmetic intensity for each hardware target (peak FLOPS / peak memory bandwidth, in FLOPs per byte).

Return shape (N, M) of int64. Entry [n, m] is:

  • 1 if model n is compute-bound on hardware m (model intensity ≥ hw intensity).
  • 0 if model n is memory-bound on hardware m (model intensity < hw intensity).

The recipe — two patterns layered:

  1. Inner-product / paired sweep: compute each model's intensity as FLOPs over bytes — a vector of shape (N,).
  2. Outer-product / grid: compare that (N,) vector against the (M,) hardware vector using broadcasting (e.g. [:, None] vs [None, :]) to land on (N, M).

Why this matters: the roofline model says a kernel achieves the lower of (1) compute peak and (2) bandwidth × intensity. When intensity ≥ peak/bandwidth you can saturate the math units; below it, the kernel is starved for data. This grid tells you, for each model and each hardware target, which side of the roofline you're on.

Math

In​=Bn​Fn​​,Gn,m​=1[In​≥Hm​]

Asked at

NumPy

import numpy as np

 

def arithmetic_intensity_grid(...):

    pass

🔒

Premium problem

Free accounts include problems #1–20. Upgrade to unlock the editor, hidden test cases, and reference solutions for every problem.

Upgrade to PremiumBack to problems

Already premium?