TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
←

112. Total Inference Memory

Medium

Given the three main components of inference memory, return the total in gigabytes (using the GB = 1e9 bytes convention).

Signature: def inference_memory_gb(n_params: int, dtype_bytes: int, kv_cache_bytes: int, activation_buffer_gb: float = 1.0) -> float

Formula:

total_gb = (n_params * dtype_bytes + kv_cache_bytes) / 1e9 + activation_buffer_gb

The three terms:

  1. Weights: n_params * dtype_bytes
  2. KV cache: precomputed as kv_cache_bytes
  3. Activation/runtime buffer: a fixed overhead (CUDA context, framework workspace, attention scratch).

Example: 7B params in fp16 (2 bytes) with 0 KV and 1 GB buffer -> (7e9 * 2 + 0) / 1e9 + 1.0 = 15.0 GB.

Math

Minfer​=109N⋅b+MKV​​+Mbuf​

Asked at

NumPy

import numpy as np

 

def inference_memory_gb(...):

    pass

🔒

Premium problem

Free accounts include problems #1–20. Upgrade to unlock the editor, hidden test cases, and reference solutions for every problem.

Upgrade to PremiumBack to problems

Already premium?