-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add timeout handling for on-policy algorithms #658
Conversation
…elines3 into feat/timeout-on-policy
@zhihanyang2022 could you have a look at that one? |
@araffin Sorry, which part do you want me to take a look? It seems like the changes are passing tests correctly? |
could you review the code? (both logic and style/naming) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with few comments :). I would like second opinion if @zhihanyang2022 has time to look over this (about ~20 lines of relevant code changes).
Co-authored-by: Anssi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still trying to understand how testing works here, but I've checked the main changes made to on_policy_algorithm.py
and test_gae.py
and I think they are correct.
Adding gamma * terminal_value
to the final reward is logically identical to what's done in OpenAI SpinUp:
to give a bit more context, I created an environment where I know in advance the true value of each state (because the reward does not depend on the policy) but that is different if you look at it as an infinite or finite horizon problem. On the other hand, if we treat the problem as infinite horizon, the true value is a geometric series |
Description
Motivation and Context
closes #633
Types of changes
Checklist:
make format
(required)make check-codestyle
andmake lint
(required)make pytest
andmake type
both pass. (required)make doc
(required)Note: You can run most of the checks using
make commit-checks
.Note: we are using a maximum length of 127 characters per line