When a batch is too large for memory, you split it into accumulation_steps micro-batches, accumulate their gradients, and apply one optimizer step. To match the gradient of one big batch you must average — not just sum — the per-step gradients.
Signature: def accumulated_gradient(grads: list, accumulation_steps: int) -> list
grads is a list of accumulation_steps numpy arrays of identical shape. Return the elementwise mean as a numpy array (or nested list).
Math
Asked at
import numpy as np
def accumulated_gradient(...):
pass
Premium problem
Free accounts include problems #1–20. Upgrade to unlock the editor, hidden test cases, and reference solutions for every problem.
Already premium?