Skip to content

Commit

Permalink
Merge pull request openai#207 from sagnik-chatterjee/dev
Browse files Browse the repository at this point in the history
fixed typo in /docs/spinningup/extra_pg_proof2.rst
  • Loading branch information
jachiam authored Feb 2, 2020
2 parents 0cba288 + 4be88c2 commit ed725b3
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion docs/spinningup/extra_pg_proof2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ In this section, we will show that
for the finite-horizon undiscounted return setting. (An analagous result holds in the infinite-horizon discounted case using basically the same proof.)


The proof of this claim depends on the `law of iterated expectations`_. First, let's rewrite the expression for the policy gradient, starting from the reward-to-go form (using the notation :math:`\hat{R}_t = \sum_{t'=t}^T R(s_t, a_t, s_{t+1})` to help shorten things):
The proof of this claim depends on the `law of iterated expectations`_. First, let's rewrite the expression for the policy gradient, starting from the reward-to-go form (using the notation :math:`\hat{R}_t = \sum_{t'=t}^T R(s_t', a_t', s_{t'+1})` to help shorten things):

.. math::
Expand Down

0 comments on commit ed725b3

Please sign in to comment.