TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
←

23. All-Gather

Easy

All-Gather is a collective communication primitive where every worker broadcasts its local shard and receives all other workers' shards, so that each worker ends up with the complete tensor.

It is used in:

  • FSDP — each worker holds a shard of the model parameters; before a forward pass the full parameter tensor is reconstructed via all-gather.
  • Tensor parallelism — output activations split across GPUs are reassembled.

Signature: def all_gather(shards)

  • shards: 2-D array of shape (num_workers, shard_size) — row i is worker i's local shard
  • Returns: 1-D array of shape (num_workers * shard_size,) — shards concatenated in rank order

Math

all_gather([s0​,s1​,…,sN−1​])=[s0​∥s1​∥⋯∥sN−1​]

Asked at

NumPy

import numpy as np

 

def all_gather(...):

    pass

🔒

Premium problem

Free accounts include problems #1–20. Upgrade to unlock the editor, hidden test cases, and reference solutions for every problem.

Upgrade to PremiumBack to problems

Already premium?