Min GPUs Grid Search

Given an array of model byte-sizes and an array of GPU VRAM byte-sizes, return a 2-D grid where entry [m, n] is the minimum number of GPUs needed to hold model m on GPUs of size n.

Implement: def min_gpus_grid(model_bytes, gpu_bytes) where:

model_bytes is shape (M,) — total bytes per model.
gpu_bytes is shape (N,) — VRAM per GPU SKU.

Return shape (M, N) of int64. Entry [m, n] = ceil(model_bytes[m] / gpu_bytes[n]).

The recipe (the all-pairs pattern from problem #222):

return np.ceil(model_bytes[:, None] / gpu_bytes[None, :]).astype(np.int64)

model_bytes[:, None] is shape (M, 1).
gpu_bytes[None, :] is shape (1, N).
(M, 1) / (1, N) broadcasts to (M, N).
np.ceil rounds up; .astype(int64) makes it integer.

This is the exact vectorization an SRE writes when sizing a heterogeneous GPU fleet across a model catalog. One expression, one heatmap.

Math

G_{m, n} = ⌈ \frac{M _{m}}{V _{n}} ⌉

Asked at