You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a common definition of an target value in classical RL:
I'm a little confused about the way of calculating target value here in reanalyze_worker.py:
Why we do not multiply the bootstrap value (here is value_lst) by the discount_factor^td_steps, and why we do not mask the bootsrap value when the target obs is a done state.
Looking forward to your reply!
The text was updated successfully, but these errors were encountered:
Thanks for your open-sourced code very much.
This is a common definition of an target value in classical RL:

I'm a little confused about the way of calculating target value here in reanalyze_worker.py:
Why we do not multiply the bootstrap value (here is
value_lst
) by thediscount_factor^td_steps
, and why we do not mask the bootsrap value when the target obs is a done state.Looking forward to your reply!
The text was updated successfully, but these errors were encountered: