Question about the effect of discount factor and done mask when calculating the target value? #42

puyuan1996 · 2022-12-28T09:25:08Z

Thanks for your open-sourced code very much.

This is a common definition of an target value in classical RL:

I'm a little confused about the way of calculating target value here in reanalyze_worker.py:

Why we do not multiply the bootstrap value (here is value_lst) by the discount_factor^td_steps, and why we do not mask the bootsrap value when the target obs is a done state.

Looking forward to your reply！

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the effect of discount factor and done mask when calculating the target value? #42

Question about the effect of discount factor and done mask when calculating the target value? #42

puyuan1996 commented Dec 28, 2022

Question about the effect of discount factor and done mask when calculating the target value? #42

Question about the effect of discount factor and done mask when calculating the target value? #42

Comments

puyuan1996 commented Dec 28, 2022