-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Data] Propagate driver DataContext
to RayTrainWorkers
#40116
[Data] Propagate driver DataContext
to RayTrainWorkers
#40116
Conversation
Signed-off-by: Scott Lee <[email protected]>
Signed-off-by: Scott Lee <[email protected]>
Signed-off-by: Scott Lee <[email protected]>
Signed-off-by: Scott Lee <[email protected]>
Signed-off-by: Scott Lee <[email protected]>
Signed-off-by: Scott Lee <[email protected]>
Signed-off-by: Scott Lee <[email protected]>
Signed-off-by: Scott Lee <[email protected]>
Signed-off-by: Scott Lee <[email protected]>
CI run with ML / RL tests passing: https://buildkite.com/ray-project/oss-ci-build-pr/builds/37993 Going to now revert manual enabling the RL tests trigger. |
Signed-off-by: Scott Lee <[email protected]>
Signed-off-by: Scott Lee <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @scottjlee, this solution is cleaner than the previous one.
Also, can you elaborate more on the RLLib Learner issue?
Yeah, the previous implementation, which added a new parameter into |
# TODO(@justinvyu: fix test and/or deprecate relevant code path) | ||
@pytest.mark.skip("Mocked execute_async doesn't work as intended") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this intentional as part of this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, paired with @justinvyu on this for some time, and we came to the conclusion that the mocking inside the test may need to be updated to be compatible with the fix in this PR, but we couldn't figure it out. I think @justinvyu said he can come back in the future to fix or remove the test, will also let him elaborate
Why are these changes needed?
Second attempt on #39698, which was found to be incompatible with RLLib
Learner
classes. In this PR, we instead move the logic of passing the driver'sDataContext
into theBackendExecutor
, instead of theRayTrainWorker
as previously.Related issue number
Closes #39237
Previous PR: #39698
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.