TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
←

80. Pipeline Parallelism (GPipe)

Hard

Pipeline parallelism splits model layers across devices. GPipe's approach: split the mini-batch into M micro-batches and pipeline them through stages. Each stage processes one micro-batch at a time while the next stage processes the previous one.

Simulate GPipe forward: given K pipeline stages (each a single linear+tanh layer), process M micro-batches through all stages sequentially.

Signature: def pipeline_forward(micro_batches, stage_weights, stage_biases)

  • micro_batches: (M, batch_size, d) — M micro-batches of data
  • stage_weights: (K, d, d) — K stage weight matrices
  • stage_biases: (K, d) — K stage biases
  • Returns: (M, batch_size, d) — output for each micro-batch after all K stages

Each stage applies: h = tanh(h @ W_k.T + b_k)

Math

hk​=tanh(hk−1​Wk⊤​+bk​),k=1,…,K

Asked at

NumPy

import numpy as np

 

def pipeline_forward(...):

    pass

🔒

Premium problem

Free accounts include problems #1–20. Upgrade to unlock the editor, hidden test cases, and reference solutions for every problem.

Upgrade to PremiumBack to problems

Already premium?