-
Notifications
You must be signed in to change notification settings - Fork 6.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Reinstate trajectory view API tests. #18809
[RLlib] Reinstate trajectory view API tests. #18809
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a lot of nice cleanups, awesome change.
I just have a couple of question mostly for my education.
thanks.
results = trainer.train() | ||
assert results["train_batch_size"] == config["train_batch_size"] | ||
assert results["timesteps_total"] == config["train_batch_size"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a random question that I have been curious about for a while: how much do we honor the train_batch_size param here?
for example, in complete_episode mode, or if there is sample replay, will we ever give a training batch that is of very different size?
thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We try to do our best to honor it, but it's not guaranteed to be exact always.
The reason is that we do parallel rollouts with a fixed (or full episode length) step limit per vectorized(!) environment. Depending on the number of vectorized sub-envs per worker and the number of workers, the final train batch may be slightly off. For PPO for example, we auto-correct the rollout_fragment_length (since a few releases ago) based on these factors to better match the train_batch_size, but of course if you have lots of odd numbers in these setting, you will not get the train batch exactly right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the explanation, that's my impression as well.
I actually have a feeling sometimes we may be off a lot.
I can probably do some testing when I get a chance.
thanks.
SampleBatch.ACTIONS, | ||
shift=1, | ||
space=action_space, | ||
used_for_compute_actions=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should have validation for this field somewhere? seems easy to miss, and not straight-forward for regular users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean to check, whether it's even possible to have this in the action computation event, even though the shift is >0 from a "collected" field, like actions? Great idea!
This PR reinstates the commented-out trajectory view API tests in
rllib/evaluation/tests/test_trajectory_view_api.py
.is_recurrent()
when used with attention nets;Why are these changes needed?
Related issue number
Checks
scripts/format.sh
to lint the changes in this PR.