TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
←

73. Transformer MLP Block

Easy

Implement the FFN (feed-forward network) sublayer used in every Transformer layer.

It applies two linear layers with a GELU activation, adds a residual connection, and normalizes with LayerNorm:

FFN(x) = LayerNorm(x + W2 * GELU(W1 * x + b1) + b2)

Signature: def transformer_mlp(x, W1, b1, W2, b2, gamma, beta)

  • x: (d_model,) — input vector
  • W1: (d_ff, d_model), b1: (d_ff,) — first linear layer
  • W2: (d_model, d_ff), b2: (d_model,) — second linear layer
  • gamma, beta: (d_model,) — LayerNorm scale and shift
  • Returns: (d_model,)

Use the exact GELU formula: 0.5 * h * (1 + erf(h / sqrt(2))). LayerNorm epsilon: 1e-5.

Math

FFN(x)=LayerNorm(x+W2​GELU(W1​x+b1​)+b2​)

Related problems

  • Transformer MLP Block (PyTorch)easyPyTorch

Asked at

NumPy

import numpy as np

 

def transformer_mlp(...):

    pass

🔒

Premium problem

Free accounts include problems #1–20. Upgrade to unlock the editor, hidden test cases, and reference solutions for every problem.

Upgrade to PremiumBack to problems

Already premium?