KV Cache Memory Budget

Given a VRAM budget for KV cache, compute the max concurrent sequences we can serve at full sequence length.

Signature: def max_concurrent_seqs(vram_for_kv_bytes: int, seq_len: int, n_layers: int, n_kv_heads: int, head_dim: int, dtype_bytes: int) -> int

Per-sequence KV bytes: 2 * seq_len * n_layers * n_kv_heads * head_dim * dtype_bytes (factor 2 for both K and V).