TorchedUp
ProblemsPremium
TorchedUp
Overlap Compute and CommMedium
ProblemsPremium

Overlap Compute and Comm

In a pipelined training step, gradient AllReduce can run concurrently with backward compute. Compute the per-step time with and without overlap.

Signature: def pipeline_step_time(compute_ms: float, comm_ms: float, overlap: bool) -> float

  • With overlap: max(compute_ms, comm_ms) — whichever is the bottleneck.
  • Without overlap: compute_ms + comm_ms — strictly sequential.

Example:

  • compute=100ms, comm=40ms, overlap=True → 100ms
  • compute=100ms, comm=40ms, overlap=False → 140ms

Math

Asked at

Python (numpy)0/3 runs today

Test Results

○overlap, compute-bound
○no overlap
○overlap, comm-bound🔒 Premium
Advertisement