TorchedUp
ProblemsPremium
TorchedUp
Transformer MLP BlockEasy
ProblemsPremium

Transformer MLP Block

Implement the FFN (feed-forward network) sublayer used in every Transformer layer.

It applies two linear layers with a GELU activation, adds a residual connection, and normalizes with LayerNorm:

FFN(x) = LayerNorm(x + W2 * GELU(W1 * x + b1) + b2)

Signature: def transformer_mlp(x, W1, b1, W2, b2, gamma, beta)

  • x: (d_model,) — input vector
  • W1: (d_ff, d_model), b1: (d_ff,) — first linear layer
  • W2: (d_model, d_ff), b2: (d_model,) — second linear layer
  • gamma, beta: (d_model,) — LayerNorm scale and shift
  • Returns: (d_model,)

Use the exact GELU formula: 0.5 * h * (1 + erf(h / sqrt(2))). LayerNorm epsilon: 1e-5.

Math

Asked at

Python (numpy)0/3 runs today

Test Results

○seed 42 — d_model=4 d_ff=8
○seed 1 — d_model=4 d_ff=8
○zero weights — reduces to LayerNorm(x)🔒 Premium
Advertisement