Inference Memory Sweep

Vectorize the total inference memory formula (inference-memory-full, problem #112) over a batch of configs.

Implement: def inference_memory_sweep(n_params, kv_cache_bytes, dtype_bytes, activation_buffer_gb=1.0) where:

n_params, kv_cache_bytes, dtype_bytes are 1-D arrays of shape (N,) — paired per config.
activation_buffer_gb is a scalar float (constant across the sweep).

Return shape (N,) of float64 — total inference memory in GB per config.

Per config: weight bytes are n_params · dtype_bytes; total bytes are weight bytes plus the KV-cache bytes; convert to GB and add the constant activation buffer. See the math reference below.

The vectorization: the scalar formula is just elementwise arithmetic — no reductions, no transposes. Cast the three input arrays to float, evaluate the per-config expression, and return a (N,) array.

Why float upfront? n_params of 70_000_000_000 and dtype_bytes of 1 give a product that fits in int64 — but the moment you add kv_cache_bytes and divide by 1e9, you're in floating-point territory. Casting upfront avoids any silent integer-rounding surprises.

Math

M_{i} = \frac{N _{i} \cdot b _{i} + K V _{i}}{1 0 ^{9}} + M_{buf}

Asked at