PPO Clip Objective (PyTorch)

Implement the PPO clip objective in PyTorch.

Signature: def ppo_clip_loss(log_probs_new: torch.Tensor, log_probs_old: torch.Tensor, advantages: torch.Tensor, eps: float = 0.2) -> torch.Tensor

The rule: you may NOT call F.relu, F.clip, or any high-level objective wrapper. Implement clip and min yourself with primitives.

Allowed primitives: .exp(), .clamp, torch.minimum, .mean(), basic arithmetic.

The PPO clip objective compares the per-sample importance ratio (new policy over old, recovered from log-prob differences) against the same ratio clipped to [1 - eps, 1 + eps], takes the elementwise minimum of the two ratio-times-advantage terms, averages over all elements (batch + time + any extra axes), and negates so we minimize a loss whose minimum corresponds to maximizing the surrogate objective. See the math reference below.

PyTorch idioms vs the NumPy version:

np.clip(arr, lo, hi) becomes arr.clamp(lo, hi) (or arr.clamp(min=lo, max=hi)).
np.minimum(a, b) becomes torch.minimum(a, b) — element-wise min of two tensors. Don't confuse with a.min(b) (no such method) or a.min(dim=...) (reduction with NamedTuple return).

Math

L^{C L I P} = - E [min (r_{t} A_{t}, clip (r_{t}, 1 - ϵ, 1 + ϵ) A_{t})]

246. PPO Clip Objective (PyTorch)

246. PPO Clip Objective (PyTorch)