From 4cb53a2e3285731f054dd45599260e67d55eeda9 Mon Sep 17 00:00:00 2001 From: Chris Marciniak Date: Wed, 28 Nov 2018 10:54:09 -0600 Subject: [PATCH] Changed use to using --- docs/algorithms/vpg.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/algorithms/vpg.rst b/docs/algorithms/vpg.rst index 1b612bb9c..bc4e06a2e 100644 --- a/docs/algorithms/vpg.rst +++ b/docs/algorithms/vpg.rst @@ -40,7 +40,7 @@ The policy gradient algorithm works by updating policy parameters via stochastic \theta_{k+1} = \theta_k + \alpha \nabla_{\theta} J(\pi_{\theta_k}) -Policy gradient implementations typically compute advantage function estimates based on the infinite-horizon discounted return, despite otherwise use the finite-horizon undiscounted policy gradient formula. +Policy gradient implementations typically compute advantage function estimates based on the infinite-horizon discounted return, despite otherwise using the finite-horizon undiscounted policy gradient formula. Exploration vs. Exploitation ----------------------------