TorchedUp
ProblemsPremium
TorchedUp
Generalized Advantage EstimationHard
ProblemsPremium

Generalized Advantage Estimation (GAE)

Implement GAE-lambda for a single trajectory. Given per-step rewards and value estimates, walk backward and compute exponentially-weighted advantages.

Signature: def gae(rewards: list, values: list, gamma: float = 0.99, lam: float = 0.95) -> list

  • rewards: list of length T
  • values: list of length T (value at each state). Treat V(s_T) as 0 (terminal bootstrap).

Recursion (walk t = T-1 down to 0):

v_next = values[t+1] if t+1 < T else 0
delta_t = rewards[t] + gamma * v_next - values[t]
A_t     = delta_t + gamma * lam * A_{t+1}    # with A_T = 0

Return a list of T floats.

Math

Asked at

Python (numpy)0/3 runs today

Test Results

○zero rewards, zero values
○single step terminal reward
○standard ppo defaults🔒 Premium
Advertisement