Question about VPG implementation #141

djjh · 2019-04-16T12:50:30Z

Well two questions really, about these lines:

Lines 214 to 219 in 97c8c34

    
           # Policy gradient step 
        
           sess.run(train_pi, feed_dict=inputs) 
        
           # Value function learning 
        
           for _ in range(train_v_iters): 
        
               sess.run(train_v, feed_dict=inputs)

Is the order between updating the value function estimator and the policy all that important?
Why do we need to have an inner loop for training the value function estimator when the input data is not changing? (My guess would be to avoid local truncation error from alternatively increasing the learning rate)

jachiam · 2019-04-17T23:38:44Z

The order in this particular case doesn't matter at all, because the policy and value function share no parameters.
The inner loop is to make more progress on solving the value-learning optimization problem (find a map from states to reward-to-go which maps the empirical reward-to-go at this iteration) than a single gradient step alone would make. If you were to take a large-learning rate single step, you would probably land at the wrong parameters (because the loss is not linear in the parameters), so multiple steps of gradient descent help.

djjh · 2019-04-18T13:51:07Z

Thanks!

djjh · 2019-04-20T16:16:57Z

Quick follow up question about #2, is the same logic not applied to solving the policy optimizing problem because the loss function is not mean to converge? Could more iterations be useful for any other reason?

rojas70 · 2020-06-14T15:42:55Z

Given GAE, you need the best V_estimate for a given policy Pi to compute your advantage function. So, all your computation needs to be constrained by the equations.

jachiam closed this as completed Apr 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about VPG implementation #141

Question about VPG implementation #141

djjh commented Apr 16, 2019

jachiam commented Apr 17, 2019

djjh commented Apr 18, 2019

djjh commented Apr 20, 2019

rojas70 commented Jun 14, 2020

Question about VPG implementation #141

Question about VPG implementation #141

Comments

djjh commented Apr 16, 2019

jachiam commented Apr 17, 2019

djjh commented Apr 18, 2019

djjh commented Apr 20, 2019

rojas70 commented Jun 14, 2020