Prefix Caching (Prompt KV Reuse)

Implement a prefix cache lookup that finds the longest cached prefix for a new request, enabling KV cache reuse across requests with shared system prompts.

Signature: def prefix_cache_lookup(new_tokens, cache)

new_tokens: list of token IDs for the new request
cache: list of cached entries — each entry is a dict {"tokens": [int, ...]} (variable-length token sequences)
Returns: (cache_hit_length, tokens_to_compute)
- cache_hit_length: int — how many tokens can be reused from cache
- tokens_to_compute: list — the remaining tokens that need KV computation

Algorithm: Longest Prefix Match

For each cached sequence, count how many tokens match new_tokens starting from index 0 and stopping at the first mismatch (a maximal common prefix, not a substring or LCS). Take the maximum match length over all cache entries — that's the cache hit length. The remaining tokens that still need KV computation are the suffix of new_tokens beyond that hit length. If the cache is empty or nothing matches, the hit length is 0 and tokens_to_compute equals the full input.

Real-World Impact

With a 500-token system prompt shared across 1000 requests/minute, prefix caching reuses those 500 tokens' KV cache for every request — reducing prefill compute by 500× for the shared portion. vLLM implements this via RadixAttention (a radix tree over token sequences).

Asked at

Implement a prefix cache lookup that finds the longest cached prefix for a new request, enabling KV cache reuse across requests with shared system prompts.

Signature: def prefix_cache_lookup(new_tokens, cache)

new_tokens: list of token IDs for the new request
cache: list of cached entries — each entry is a dict {"tokens": [int, ...]} (variable-length token sequences)
Returns: (cache_hit_length, tokens_to_compute)
- cache_hit_length: int — how many tokens can be reused from cache
- tokens_to_compute: list — the remaining tokens that need KV computation

Algorithm: Longest Prefix Match

Real-World Impact

Asked at

197. Prefix Caching (Prompt KV Reuse)

Algorithm: Longest Prefix Match

Real-World Impact

197. Prefix Caching (Prompt KV Reuse)

Algorithm: Longest Prefix Match

Real-World Impact