Pipeline parallelism splits model layers across devices. GPipe's approach: split the mini-batch into M micro-batches and pipeline them through stages. Each stage processes one micro-batch at a time while the next stage processes the previous one.
Simulate GPipe forward: given K pipeline stages (each a single linear+tanh layer), process M micro-batches through all stages sequentially.
Signature: def pipeline_forward(micro_batches, stage_weights, stage_biases)
micro_batches: (M, batch_size, d) — M micro-batches of datastage_weights: (K, d, d) — K stage weight matricesstage_biases: (K, d) — K stage biases(M, batch_size, d) — output for each micro-batch after all K stagesEach stage applies: h = tanh(h @ W_k.T + b_k)
Math
Asked at
import numpy as np
def pipeline_forward(...):
pass
Premium problem
Free accounts include problems #1–20. Upgrade to unlock the editor, hidden test cases, and reference solutions for every problem.
Already premium?