-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Fix train_batch_size_per_learner
problems.
#49715
[RLlib] Fix train_batch_size_per_learner
problems.
#49715
Conversation
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. It is not clear, yet, why we need all three _train_batch_size
, total_train_batch_size
and train_batch_size_per_learner
. Also, total_train_batch_size
depends on train_batch_size_per_learner
whcih could be None
as I understood it?
and not self.in_evaluation | ||
and self.total_train_batch_size > 0 | ||
): | ||
if self.rollout_fragment_length != "auto" and not self.in_evaluation: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, is "auto"
now the only configuration possible?
@OldAPIStack: User never touches `train_batch_size_per_learner` or | ||
`num_learners`) -> `train_batch_size`. | ||
""" | ||
return self.train_batch_size_per_learner * (self.num_learners or 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we return here not the private attribute self.train_batch_size
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
B/c self.train_batch_size
is old API stack. So we should no longer reference it anywhere in the new API stack logic.
I agree it's a little confusing right now. We need to get rid of |
Signed-off-by: dayshah <[email protected]>
Signed-off-by: lielin.hyl <[email protected]>
Signed-off-by: Puyuan Yao <[email protected]>
Fix
train_batch_size_per_learner
problems.self._train_batch_size_per_learner
private value holder).config.train_batch_size / num_learners
.This fixes a couple of got'chas where users don't explicitly set this property in their configs (or they think setting
train_batch_size
is enough, even though the migration guide talks about how to translate this setting), then run into errors b/c the default value of this property used to beNone
.Note: Users should not notice these changes b/c they are seamless. They can still configure
train_batch_size_per_learner
(or not) w/o change of RLlib's behavior, other than that it doesn't crash anymore.Why are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.