PagedAttention Block Utilization

PagedAttention — Block Utilization

PagedAttention allocates KV cache in fixed-size blocks. Internal fragmentation occurs when a sequence's length isn't a multiple of block_size. Compute the fraction of allocated block-slots that actually hold tokens.

Signature: def block_utilization(seq_lens: list, block_size: int) -> float

seq_lens is a list of sequence lengths (positive ints). block_size is the page size in tokens.

Total tokens = sum(seq_lens)
Total blocks = sum(ceil(s / block_size) for s in seq_lens)
Utilization = total_tokens / (total_blocks * block_size)

Return a float in (0, 1]. If seq_lens is empty, return 0.0.

Example:

seq_lens=[18, 33], block_size=16 → blocks = ceil(18/16) + ceil(33/16) = 2 + 3 = 5. tokens = 51. util = 51 / 80 = 0.6375.

Math

Asked at