Given the three main components of inference memory, return the total in gigabytes (using the GB = 1e9 bytes convention).
Signature: def inference_memory_gb(n_params: int, dtype_bytes: int, kv_cache_bytes: int, activation_buffer_gb: float = 1.0) -> float
Formula:
total_gb = (n_params * dtype_bytes + kv_cache_bytes) / 1e9 + activation_buffer_gb
The three terms:
n_params * dtype_byteskv_cache_bytesExample: 7B params in fp16 (2 bytes) with 0 KV and 1 GB buffer -> (7e9 * 2 + 0) / 1e9 + 1.0 = 15.0 GB.
Math
Asked at
Test Results