TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
←

228. Inference Memory Sweep

Medium

Vectorize the total inference memory formula (inference-memory-full, problem #112) over a batch of configs.

Implement: def inference_memory_sweep(n_params, kv_cache_bytes, dtype_bytes, activation_buffer_gb=1.0) where:

  • n_params, kv_cache_bytes, dtype_bytes are 1-D arrays of shape (N,) — paired per config.
  • activation_buffer_gb is a scalar float (constant across the sweep).

Return shape (N,) of float64 — total inference memory in GB per config.

Per config: weight bytes are n_params · dtype_bytes; total bytes are weight bytes plus the KV-cache bytes; convert to GB and add the constant activation buffer. See the math reference below.

The vectorization: the scalar formula is just elementwise arithmetic — no reductions, no transposes. Cast the three input arrays to float, evaluate the per-config expression, and return a (N,) array.

Why float upfront? n_params of 70_000_000_000 and dtype_bytes of 1 give a product that fits in int64 — but the moment you add kv_cache_bytes and divide by 1e9, you're in floating-point territory. Casting upfront avoids any silent integer-rounding surprises.

Math

Mi​=109Ni​⋅bi​+KVi​​+Mbuf​

Asked at

NumPy

import numpy as np

 

def inference_memory_sweep(...):

    pass

🔒

Premium problem

Free accounts include problems #1–20. Upgrade to unlock the editor, hidden test cases, and reference solutions for every problem.

Upgrade to PremiumBack to problems

Already premium?