TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
←

246. PPO Clip Objective (PyTorch)

Hard

Implement the PPO clip objective in PyTorch.

Signature: def ppo_clip_loss(log_probs_new: torch.Tensor, log_probs_old: torch.Tensor, advantages: torch.Tensor, eps: float = 0.2) -> torch.Tensor

The rule: you may NOT call F.relu, F.clip, or any high-level objective wrapper. Implement clip and min yourself with primitives.

Allowed primitives: .exp(), .clamp, torch.minimum, .mean(), basic arithmetic.

The PPO clip objective compares the per-sample importance ratio (new policy over old, recovered from log-prob differences) against the same ratio clipped to [1 - eps, 1 + eps], takes the elementwise minimum of the two ratio-times-advantage terms, averages over all elements (batch + time + any extra axes), and negates so we minimize a loss whose minimum corresponds to maximizing the surrogate objective. See the math reference below.

PyTorch idioms vs the NumPy version:

  • np.clip(arr, lo, hi) becomes arr.clamp(lo, hi) (or arr.clamp(min=lo, max=hi)).
  • np.minimum(a, b) becomes torch.minimum(a, b) — element-wise min of two tensors. Don't confuse with a.min(b) (no such method) or a.min(dim=...) (reduction with NamedTuple return).

Math

LCLIP=−E[min(rt​At​,clip(rt​,1−ϵ,1+ϵ)At​)]

Related problems

  • PPO Clip ObjectivehardNumPy

Asked at

NumPy

import numpy as np

 

def ppo_clip_loss(...):

    pass

🔒

Premium problem

Free accounts include problems #1–20. Upgrade to unlock the editor, hidden test cases, and reference solutions for every problem.

Upgrade to PremiumBack to problems

Already premium?