From 4be88c23fea6afbf10f10155eb4e310a8fbdc63a Mon Sep 17 00:00:00 2001 From: sagnik-chatterjee Date: Sun, 2 Feb 2020 22:18:22 +0530 Subject: [PATCH] fixed typo in /docs/spinningup/extra_pg_proof2.rst --- docs/spinningup/extra_pg_proof2.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/spinningup/extra_pg_proof2.rst b/docs/spinningup/extra_pg_proof2.rst index defd3076f..b2a42c8e7 100644 --- a/docs/spinningup/extra_pg_proof2.rst +++ b/docs/spinningup/extra_pg_proof2.rst @@ -14,7 +14,7 @@ In this section, we will show that for the finite-horizon undiscounted return setting. (An analagous result holds in the infinite-horizon discounted case using basically the same proof.) -The proof of this claim depends on the `law of iterated expectations`_. First, let's rewrite the expression for the policy gradient, starting from the reward-to-go form (using the notation :math:`\hat{R}_t = \sum_{t'=t}^T R(s_t, a_t, s_{t+1})` to help shorten things): +The proof of this claim depends on the `law of iterated expectations`_. First, let's rewrite the expression for the policy gradient, starting from the reward-to-go form (using the notation :math:`\hat{R}_t = \sum_{t'=t}^T R(s_t', a_t', s_{t'+1})` to help shorten things): .. math::