PagedAttention allocates KV cache in fixed-size blocks. Internal fragmentation occurs when a sequence's length isn't a multiple of block_size. Compute the fraction of allocated block-slots that actually hold tokens.
Signature: def block_utilization(seq_lens: list, block_size: int) -> float
seq_lens is a list of sequence lengths (positive ints). block_size is the page size in tokens.
sum(seq_lens)sum(ceil(s / block_size) for s in seq_lens)total_tokens / (total_blocks * block_size)Return a float in (0, 1]. If seq_lens is empty, return 0.0.
Example:
seq_lens=[18, 33], block_size=16 → blocks = ceil(18/16) + ceil(33/16) = 2 + 3 = 5. tokens = 51. util = 51 / 80 = 0.6375.Math
Asked at
Test Results