Gradient accumulation simulates a large effective batch size by accumulating gradients over K micro-batches before updating weights. The function below has a bug in how the final loss is normalized — find and fix it.
Signature: def buggy_grad_accumulation(losses, accumulation_steps)
losses: flat list of per-sample losses, length = accumulation_steps * batch_sizeaccumulation_steps: K, the number of micro-batches to accumulateContract: the returned value must equal the global mean of losses (i.e. what you'd get by training on the full batch in one step). Equivalently, scaling accumulation_steps while keeping the same total samples must not change the result.
Math
Asked at
import numpy as np
def buggy_grad_accumulation(...):
pass
Premium problem
Free accounts include problems #1–20. Upgrade to unlock the editor, hidden test cases, and reference solutions for every problem.
Already premium?