Implement GAE-lambda for a single trajectory. Given per-step rewards and value estimates, walk backward and compute exponentially-weighted advantages.
Signature: def gae(rewards: list, values: list, gamma: float = 0.99, lam: float = 0.95) -> list
rewards: list of length Tvalues: list of length T (value at each state). Treat V(s_T) as 0 (terminal bootstrap).Recursion (walk t = T-1 down to 0):
v_next = values[t+1] if t+1 < T else 0
delta_t = rewards[t] + gamma * v_next - values[t]
A_t = delta_t + gamma * lam * A_{t+1} # with A_T = 0
Return a list of T floats.
Math
Asked at
Test Results