TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
←

54. GeLU Activation

Easy

GeLU (Gaussian Error Linear Unit) is the activation function used in BERT, GPT-2, and most modern transformers. Unlike ReLU which hard-gates inputs, GeLU weights inputs by the probability they are positive under a Gaussian distribution.

Exact form: GeLU(x) = x * Φ(x) where Φ is the Gaussian CDF.

Since computing Φ exactly is expensive, PyTorch also provides a tanh approximation:

GeLU(x) ≈ 0.5 * x * (1 + tanh(sqrt(2/π) * (x + 0.044715 * x³)))

Implement both forms. When approximate=False (default), use the exact form via scipy.special.erf. When approximate=True, use the tanh approximation.

Signature: def gelu(x: np.ndarray, approximate: bool = False) -> np.ndarray

Math

GeLU(x)=x⋅Φ(x)=2x​(1+erf(2​x​))GeLU(x)≈0.5x(1+tanh(π2​​(x+0.044715x3)))

Related problems

  • GeLU (PyTorch)easyPyTorch

Asked at

NumPy

import numpy as np

 

def gelu(...):

    pass

🔒

Premium problem

Free accounts include problems #1–20. Upgrade to unlock the editor, hidden test cases, and reference solutions for every problem.

Upgrade to PremiumBack to problems

Already premium?