TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
←

62. Debug: Gradient Accumulation

Medium

Gradient accumulation simulates a large effective batch size by accumulating gradients over K micro-batches before updating weights. The function below has a bug in how the final loss is normalized — find and fix it.

Signature: def buggy_grad_accumulation(losses, accumulation_steps)

  • losses: flat list of per-sample losses, length = accumulation_steps * batch_size
  • accumulation_steps: K, the number of micro-batches to accumulate
  • Returns: scalar — the correctly normalized mean loss over all samples

Contract: the returned value must equal the global mean of losses (i.e. what you'd get by training on the full batch in one step). Equivalently, scaling accumulation_steps while keeping the same total samples must not change the result.

Math

L=KN1​i=1∑KN​ℓi​=ℓˉ,K=accumulation_steps,N=batch_size per step

Asked at

NumPy

import numpy as np

 

def buggy_grad_accumulation(...):

    pass

🔒

Premium problem

Free accounts include problems #1–20. Upgrade to unlock the editor, hidden test cases, and reference solutions for every problem.

Upgrade to PremiumBack to problems

Already premium?