Vectorize the total inference memory formula (inference-memory-full, problem #112) over a batch of configs.
Implement: def inference_memory_sweep(n_params, kv_cache_bytes, dtype_bytes, activation_buffer_gb=1.0) where:
n_params, kv_cache_bytes, dtype_bytes are 1-D arrays of shape (N,) — paired per config.activation_buffer_gb is a scalar float (constant across the sweep).Return shape (N,) of float64 — total inference memory in GB per config.
Per config: weight bytes are n_params · dtype_bytes; total bytes are weight bytes plus the KV-cache bytes; convert to GB and add the constant activation buffer. See the math reference below.
The vectorization: the scalar formula is just elementwise arithmetic — no reductions, no transposes. Cast the three input arrays to float, evaluate the per-config expression, and return a (N,) array.
Why float upfront? n_params of 70_000_000_000 and dtype_bytes of 1 give a product that fits in int64 — but the moment you add kv_cache_bytes and divide by 1e9, you're in floating-point territory. Casting upfront avoids any silent integer-rounding surprises.
Math
Asked at
import numpy as np
def inference_memory_sweep(...):
pass
Premium problem
Free accounts include problems #1–20. Upgrade to unlock the editor, hidden test cases, and reference solutions for every problem.
Already premium?