SwiGLU is the feed-forward network (FFN) activation used in LLaMA, PaLM, and Mistral. The standard FFN is W2 · ReLU(W1 · x); SwiGLU replaces the ReLU with a Gated Linear Unit that elementwise-multiplies a SiLU-activated branch by a separate linear "gate" branch (a third weight matrix W3).
In this problem you implement the gating operation (without the output projection W2): combine the SiLU activation of the W1 branch with the linear W3 branch via elementwise multiplication. See the math reference below.
Signature: def swiglu(x: np.ndarray, W1: np.ndarray, W3: np.ndarray) -> np.ndarray
x: (d_model,) — input vectorW1: (d_ff, d_model) — gate projection weightW3: (d_ff, d_model) — up projection weight (the "gate")(d_ff,) — gated intermediate activationsMath
Related problems
Asked at
import numpy as np
def swiglu(...):
pass
Premium problem
Free accounts include problems #1–20. Upgrade to unlock the editor, hidden test cases, and reference solutions for every problem.
Already premium?