In a pipelined training step, gradient AllReduce can run concurrently with backward compute. Compute the per-step time with and without overlap.
Signature: def pipeline_step_time(compute_ms: float, comm_ms: float, overlap: bool) -> float
max(compute_ms, comm_ms) — whichever is the bottleneck.compute_ms + comm_ms — strictly sequential.Example:
Math
Asked at
import numpy as np
def pipeline_step_time(...):
pass
Premium problem
Free accounts include problems #1–20. Upgrade to unlock the editor, hidden test cases, and reference solutions for every problem.
Already premium?