A prefix cache stores KV blocks keyed by token-prefix. When a new prompt arrives, contiguous block-aligned prefix matches are reused. Compute the block-level cache hit rate across a list of prompts processed in order.
Signature: def prefix_cache_hit_rate(prompts: list, block_size: int = 16) -> float
prompts is a list of token-id lists. Tokenize each prompt into blocks of size block_size (drop the trailing partial block). Walk blocks left-to-right; a block is a hit if its prefix tuple (all blocks up to and including this one, as a tuple of tuples) was already seen during a previous prompt.
Return: total_hits / total_blocks (0.0 if total_blocks is 0).
Example:
[[1]*32, [1]*32 + [2]*16], block_size=16. First prompt: 2 blocks, no hits. Second: first 2 blocks match → 2 hits, third doesn't. 2 hits / 5 blocks = 0.4.Math
Asked at
import numpy as np
def prefix_cache_hit_rate(...):
pass
Premium problem
Free accounts include problems #1–20. Upgrade to unlock the editor, hidden test cases, and reference solutions for every problem.
Already premium?