TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
←

132. Gradient Accumulation

Medium

When a batch is too large for memory, you split it into accumulation_steps micro-batches, accumulate their gradients, and apply one optimizer step. To match the gradient of one big batch you must average — not just sum — the per-step gradients.

Signature: def accumulated_gradient(grads: list, accumulation_steps: int) -> list

grads is a list of accumulation_steps numpy arrays of identical shape. Return the elementwise mean as a numpy array (or nested list).

Math

geff​=S1​s=1∑S​gs​

Asked at

NumPy

import numpy as np

 

def accumulated_gradient(...):

    pass

🔒

Premium problem

Free accounts include problems #1–20. Upgrade to unlock the editor, hidden test cases, and reference solutions for every problem.

Upgrade to PremiumBack to problems

Already premium?