Implement a single Adam optimizer parameter update.
Signature: def adam_step(theta: np.ndarray, grad: np.ndarray, m: np.ndarray, v: np.ndarray, t: int, lr: float = 0.01, beta1: float = 0.9, beta2: float = 0.999, eps: float = 1e-8) -> np.ndarray
Return the updated theta_new after one step.
m = beta1*m + (1-beta1)*gradv = beta2*v + (1-beta2)*grad^2m_hat = m/(1-beta1^t), v_hat = v/(1-beta2^t)theta -= lr * m_hat / (sqrt(v_hat) + eps)Math
Asked at
Test Results