TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
TorchedUp
LearnBetaProblemsSystem DesignSoonPremium
←

159. Generalized Advantage Estimation

Hard

Implement GAE-lambda for a single trajectory. Given per-step rewards and value estimates, walk backward and compute exponentially-weighted advantages.

Signature: def gae(rewards: list, values: list, gamma: float = 0.99, lam: float = 0.95) -> list

  • rewards: list of length T
  • values: list of length T (value at each state). Treat V(s_T) as 0 (terminal bootstrap).

Recursion (walk t = T-1 down to 0):

v_next = values[t+1] if t+1 < T else 0
delta_t = rewards[t] + gamma * v_next - values[t]
A_t     = delta_t + gamma * lam * A_{t+1}    # with A_T = 0

Return a list of T floats.

Math

A^t​=l=0∑T−t−1​(γλ)lδt+l​,δt​=rt​+γV(st+1​)−V(st​)

Asked at

NumPy

import numpy as np

 

def gae(...):

    pass

🔒

Premium problem

Free accounts include problems #1–20. Upgrade to unlock the editor, hidden test cases, and reference solutions for every problem.

Upgrade to PremiumBack to problems

Already premium?