All-Gather

All-Gather is a collective communication primitive where every worker broadcasts its local shard and receives all other workers' shards, so that each worker ends up with the complete tensor.

It is used in:

FSDP — each worker holds a shard of the model parameters; before a forward pass the full parameter tensor is reconstructed via all-gather.
Tensor parallelism — output activations split across GPUs are reassembled.

Signature: def all_gather(shards)

shards: 2-D array of shape (num_workers, shard_size) — row i is worker i's local shard
Returns: 1-D array of shape (num_workers * shard_size,) — shards concatenated in rank order

Math

Asked at

All-Gather

All-Gather is a collective communication primitive where every worker broadcasts its local shard and receives all other workers' shards, so that each worker ends up with the complete tensor.

It is used in:

FSDP — each worker holds a shard of the model parameters; before a forward pass the full parameter tensor is reconstructed via all-gather.
Tensor parallelism — output activations split across GPUs are reassembled.

Signature: def all_gather(shards)

shards: 2-D array of shape (num_workers, shard_size) — row i is worker i's local shard
Returns: 1-D array of shape (num_workers * shard_size,) — shards concatenated in rank order

Math

Asked at