TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
←

145. Prefix Cache Hit Rate

Medium

A prefix cache stores KV blocks keyed by token-prefix. When a new prompt arrives, contiguous block-aligned prefix matches are reused. Compute the block-level cache hit rate across a list of prompts processed in order.

Signature: def prefix_cache_hit_rate(prompts: list, block_size: int = 16) -> float

prompts is a list of token-id lists. Tokenize each prompt into blocks of size block_size (drop the trailing partial block). Walk blocks left-to-right; a block is a hit if its prefix tuple (all blocks up to and including this one, as a tuple of tuples) was already seen during a previous prompt.

Return: total_hits / total_blocks (0.0 if total_blocks is 0).

Example:

  • prompts = [[1]*32, [1]*32 + [2]*16], block_size=16. First prompt: 2 blocks, no hits. Second: first 2 blocks match → 2 hits, third doesn't. 2 hits / 5 blocks = 0.4.

Math

hit rate=∑p​total blocks(p)∑p​cached blocks(p)​

Asked at

NumPy

import numpy as np

 

def prefix_cache_hit_rate(...):

    pass

🔒

Premium problem

Free accounts include problems #1–20. Upgrade to unlock the editor, hidden test cases, and reference solutions for every problem.

Upgrade to PremiumBack to problems

Already premium?