TorchedUp
ProblemsPremium
TorchedUp
All-GatherEasy
ProblemsPremium

All-Gather

All-Gather is a collective communication primitive where every worker broadcasts its local shard and receives all other workers' shards, so that each worker ends up with the complete tensor.

It is used in:

  • FSDP — each worker holds a shard of the model parameters; before a forward pass the full parameter tensor is reconstructed via all-gather.
  • Tensor parallelism — output activations split across GPUs are reassembled.

Signature: def all_gather(shards)

  • shards: 2-D array of shape (num_workers, shard_size) — row i is worker i's local shard
  • Returns: 1-D array of shape (num_workers * shard_size,) — shards concatenated in rank order

Math

Asked at

Python (numpy)0/3 runs today

Test Results

○3 workers × 2 elements
○4 workers × 1 element
○single worker is a no-op🔒 Premium
Advertisement