TorchedUp
ProblemsPremium
TorchedUp
GeLU ActivationEasy
ProblemsPremium

GeLU Activation

GeLU (Gaussian Error Linear Unit) is the activation function used in BERT, GPT-2, and most modern transformers. Unlike ReLU which hard-gates inputs, GeLU weights inputs by the probability they are positive under a Gaussian distribution.

Exact form: GeLU(x) = x * Φ(x) where Φ is the Gaussian CDF.

Since computing Φ exactly is expensive, PyTorch also provides a tanh approximation:

GeLU(x) ≈ 0.5 * x * (1 + tanh(sqrt(2/π) * (x + 0.044715 * x³)))

Implement both forms. When approximate=False (default), use the exact form via scipy.special.erf. When approximate=True, use the tanh approximation.

Signature: def gelu(x: np.ndarray, approximate: bool = False) -> np.ndarray

Math

Asked at

Python (numpy)0/3 runs today

Test Results

○exact: mixed values 1D
○exact: 2D input
○approximate: mixed values 1D🔒 Premium
○preserves argmax for positive inputs
○gelu(0) = 0
○gelu(x) ≈ x for large positive x
Advertisement