Implement the per-layer KV cache that a Transformer decoder uses to support
autoregressive generation. For each step, every layer calls
cache.append(layer_idx, k, v) to extend its running K/V tensors and read
them back for attention.
This is a Build-in-Context problem. Your append is exercised by a
hidden test runner that simulates a decoder loop — multiple layers, many
steps, varied per-layer K/V values — and compares the resulting cache state
against the reference. The signature contract has to hold for the loop to
work; wrong shape, dtype, axis, or per-layer mixing all surface as a
diverged cache value.
Contract
KVCache.append(layer_idx, k, v) should append k and v (each shape
(d_model,)) to the running cache for layer_idx and return the full
(seq_len + 1, d_model) tensors. Cross-layer state must stay independent —
appending to layer 0 doesn't touch layer 1's cache.
Math
Asked at
import numpy as np
def run_kv_cache(...):
pass
Premium problem
Free accounts include problems #1–20. Upgrade to unlock the editor, hidden test cases, and reference solutions for every problem.
Already premium?