9. Adam Optimizer Step

Medium

Implement a single Adam optimizer parameter update.

Signature: def adam_step(theta: np.ndarray, grad: np.ndarray, m: np.ndarray, v: np.ndarray, t: int, lr: float = 0.01, beta1: float = 0.9, beta2: float = 0.999, eps: float = 1e-8) -> np.ndarray

Return the updated theta_new after one step.

Update biased first moment: m = beta1*m + (1-beta1)*grad
Update biased second moment: v = beta2*v + (1-beta2)*grad^2
Bias correction: m_hat = m/(1-beta1^t), v_hat = v/(1-beta2^t)
Parameter update: theta -= lr * m_hat / (sqrt(v_hat) + eps)

Math

m_{t} = β_{1} m_{t - 1} + (1 - β_{1}) g, v_{t} = β_{2} v_{t - 1} + (1 - β_{2}) g^{2}

Asked at

Python 30/10 runs today

Output

Anything you print() in your code will show up here after you click Run.

Test Results

○first step

○multi-param first step

○second step with momentum🔒 Premium

○2D (N, D) parameter matrix — first step🔒 Premium