TorchedUp
ProblemsPremium
TorchedUp
Gradient AccumulationMedium
ProblemsPremium

Correct Gradient Accumulation

When a batch is too large for memory, you split it into accumulation_steps micro-batches, accumulate their gradients, and apply one optimizer step. To match the gradient of one big batch you must average — not just sum — the per-step gradients.

Signature: def accumulated_gradient(grads: list, accumulation_steps: int) -> list

grads is a list of accumulation_steps numpy arrays of identical shape. Return the elementwise mean as a numpy array (or nested list).

Math

Asked at

Python (numpy)0/3 runs today

Test Results

○1D 2 steps
○2D 4 steps
○single step identity🔒 Premium
Advertisement