TorchedUp
ProblemsPremium
TorchedUp
SwiGLU (Gated Linear Unit)Medium
ProblemsPremium

SwiGLU (Gated Linear Unit)

SwiGLU is the feed-forward network (FFN) activation used in LLaMA, PaLM, and Mistral. It replaces the standard FFN:

FFN(x) = W2 * ReLU(W1 * x)

with a gated variant:

SwiGLU-FFN(x) = W2 * (SiLU(W1 * x) ⊙ (W3 * x))

where ⊙ is elementwise multiplication and W3 is an extra learned gate projection.

In this problem you implement the gating operation (without the output projection W2):

swiglu(x, W1, W3) = SiLU(W1 @ x) * (W3 @ x)

Signature: def swiglu(x: np.ndarray, W1: np.ndarray, W3: np.ndarray) -> np.ndarray

  • x: (d_model,) — input vector
  • W1: (d_ff, d_model) — gate projection weight
  • W3: (d_ff, d_model) — up projection weight (the "gate")
  • Returns: (d_ff,) — gated intermediate activations

Math

Asked at

Python (numpy)0/3 runs today

Test Results

○identity-like weights (d_model=4, d_ff=3)
○random weights (d_model=4, d_ff=4, seed=42)
○non-negative output for positive x and positive W1, W3
○zero input gives zero output
Advertisement