Vectorize the standard transformer parameter-count formula (param-count-transformer, problem #109) over arrays of architectural configs.
Implement: def param_count_sweep(d_model, n_layers, vocab, n_heads=None) where:
d_model, n_layers, vocab are 1-D integer arrays of shape (N,) — paired per config.n_heads is unused (kept for API compatibility with the scalar version; the formula doesn't actually need it).Return shape (N,) of int64 — parameter count per config.
Per config (GPT-style, untied embeddings): the per-layer block (attention + FFN with 4× expansion) contributes 12 · d² parameters; embeddings plus a separate lm_head contribute 2 · V · d. See the math reference below for the closed form.
Why int64 inputs? d * d * L for a 7B-class config is 4096^2 * 32 ≈ 5e8 — fits in int32, but d^2 * L for a 70B-class config (d=8192, L=80) is 5.4e9 — overflows int32 silently. Cast to int64 upfront.
Math
Asked at
import numpy as np
def param_count_sweep(...):
pass
Premium problem
Free accounts include problems #1–20. Upgrade to unlock the editor, hidden test cases, and reference solutions for every problem.
Already premium?