When you increase batch size by a factor k, the linear scaling rule says you should also multiply the learning rate by k to keep the per-example update direction roughly the same.
Signature: def scaled_lr(base_lr: float, base_batch: int, new_batch: int) -> float
Return base_lr * (new_batch / base_batch).
Math
Asked at
import numpy as np
def scaled_lr(...):
pass
Premium problem
Free accounts include problems #1–20. Upgrade to unlock the editor, hidden test cases, and reference solutions for every problem.
Already premium?