-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Warnings about existing CUDA contexts on dask-cuda cluster startup #721
Comments
This is due to #719 . The warning happens with UCX only because we do create the CUDA context in Distributed (Distributed comms initializes before initializer plugins that are used by Dask-CUDA). So I’ll need to rewire things a bit in Distributed. |
Opened dask/distributed#5308 and #722, they should be enough to address the issue here. |
Because communications in `Nanny` are initialized before Dask preload plugins, and UCX creates the context directly within its own initializer in Distributed, Dask-CUDA will always think the CUDA context has already been incorrectly initialized when using UCX, which isn't true, with the globals added here Dask-CUDA can verify the CUDA contexts are indeed valid. Depends on dask/distributed#5308 . Fixes #721 . Authors: - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - https://github.com/jakirkham URL: #722
This is in, thanks for reporting @randerzander and please let me know if you experience any problems still. |
On starting up a dask-cuda cluster on a DGX2, I get this warning for every worker:
Checking nvidia-smi, I only see one process per GPU, so nothing looks out of the ordinary despite the warning suggesting I may have multiple workers on a single GPU.
The text was updated successfully, but these errors were encountered: