Questions tagged [policy-gradients]

For questions related to reinforcement learning algorithms often referred to as "policy gradients" (or "policy gradient algorithms"), which attempt to directly optimise a parameterised policy (without first attempting to estimate value functions) using gradients of an objective function with respect to the policy's parameters.

For more info, see this tutorial: https://spinningup.openai.com/en/latest/spinningup/rl_intro3.html

197 questions

vote

1 answer

Zero reward in policy gradient

Specifically, according to this post: How is the policy gradient calculated in REINFORCE the function I need to minimise is: $−Gt \log \pi(At|St,θt)$ where $Gt$ is the discounted reward, and $\pi$ is the policy which outputs a probability…

policy-gradients

asked Jun 05 '23 at 10:14

Jason L