TorchedUp
ProblemsPremium
TorchedUp
PPO Clip ObjectiveHard
ProblemsPremium

PPO Clipped Surrogate Loss

Implement the PPO clipped objective (the inner loss minimized during each PPO epoch).

Signature: def ppo_clip_loss(log_probs_new: np.ndarray, log_probs_old: np.ndarray, advantages: np.ndarray, eps: float = 0.2) -> float

Let r = exp(log_probs_new - log_probs_old). Then the loss is

L = -mean( min( r * adv, clip(r, 1-eps, 1+eps) * adv ) )

Return a single Python float (negative because we minimize).

Math

Asked at

Python (numpy)0/3 runs today

Test Results

○ratio=1 reduces to -mean(adv)
○positive adv clipped
○negative adv clipped🔒 Premium
Advertisement