TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
←

58. SwiGLU (Gated Linear Unit)

Medium

SwiGLU is the feed-forward network (FFN) activation used in LLaMA, PaLM, and Mistral. The standard FFN is W2 · ReLU(W1 · x); SwiGLU replaces the ReLU with a Gated Linear Unit that elementwise-multiplies a SiLU-activated branch by a separate linear "gate" branch (a third weight matrix W3).

In this problem you implement the gating operation (without the output projection W2): combine the SiLU activation of the W1 branch with the linear W3 branch via elementwise multiplication. See the math reference below.

Signature: def swiglu(x: np.ndarray, W1: np.ndarray, W3: np.ndarray) -> np.ndarray

  • x: (d_model,) — input vector
  • W1: (d_ff, d_model) — gate projection weight
  • W3: (d_ff, d_model) — up projection weight (the "gate")
  • Returns: (d_ff,) — gated intermediate activations

Math

SwiGLU(x,W1​,W3​)=SiLU(W1​x)⊙(W3​x)SiLU(z)=z⋅σ(z)

Related problems

  • SwiGLU (PyTorch)mediumPyTorch

Asked at

NumPy

import numpy as np

 

def swiglu(...):

    pass

🔒

Premium problem

Free accounts include problems #1–20. Upgrade to unlock the editor, hidden test cases, and reference solutions for every problem.

Upgrade to PremiumBack to problems

Already premium?