TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
←

229. Parameter Count Sweep

Medium

Vectorize the standard transformer parameter-count formula (param-count-transformer, problem #109) over arrays of architectural configs.

Implement: def param_count_sweep(d_model, n_layers, vocab, n_heads=None) where:

  • d_model, n_layers, vocab are 1-D integer arrays of shape (N,) — paired per config.
  • n_heads is unused (kept for API compatibility with the scalar version; the formula doesn't actually need it).

Return shape (N,) of int64 — parameter count per config.

Per config (GPT-style, untied embeddings): the per-layer block (attention + FFN with 4× expansion) contributes 12 · d² parameters; embeddings plus a separate lm_head contribute 2 · V · d. See the math reference below for the closed form.

Why int64 inputs? d * d * L for a 7B-class config is 4096^2 * 32 ≈ 5e8 — fits in int32, but d^2 * L for a 70B-class config (d=8192, L=80) is 5.4e9 — overflows int32 silently. Cast to int64 upfront.

Math

Ni​=12⋅di2​⋅Li​+2⋅Vi​⋅di​

Asked at

NumPy

import numpy as np

 

def param_count_sweep(...):

    pass

🔒

Premium problem

Free accounts include problems #1–20. Upgrade to unlock the editor, hidden test cases, and reference solutions for every problem.

Upgrade to PremiumBack to problems

Already premium?