Warnings about existing CUDA contexts on dask-cuda cluster startup #721

randerzander · 2021-09-10T16:03:32Z

On starting up a dask-cuda cluster on a DGX2, I get this warning for every worker:

/home/rgelhausen/conda/envs/test/lib/python3.8/site-packages/dask_cuda/initialize.py:22: UserWarning: A CUDA context for device 14 already exists on process ID 3287969. This is often the result of a CUDA-enabled library calling a CUDA runtime function before Dask-CUDA can spawn worker processes. Please make sure any such function calls don't happen at import time or in the global scope of a program.
  warnings.warn(

Checking nvidia-smi, I only see one process per GPU, so nothing looks out of the ordinary despite the warning suggesting I may have multiple workers on a single GPU.

The text was updated successfully, but these errors were encountered:

pentschev · 2021-09-10T16:09:23Z

This is due to #719 . The warning happens with UCX only because we do create the CUDA context in Distributed (Distributed comms initializes before initializer plugins that are used by Dask-CUDA). So I’ll need to rewire things a bit in Distributed.

pentschev · 2021-09-10T20:41:10Z

Opened dask/distributed#5308 and #722, they should be enough to address the issue here.

Because communications in `Nanny` are initialized before Dask preload plugins, and UCX creates the context directly within its own initializer in Distributed, Dask-CUDA will always think the CUDA context has already been incorrectly initialized when using UCX, which isn't true, with the globals added here Dask-CUDA can verify the CUDA contexts are indeed valid. Depends on dask/distributed#5308 . Fixes #721 . Authors: - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - https://github.com/jakirkham URL: #722

pentschev · 2021-09-14T15:34:09Z

This is in, thanks for reporting @randerzander and please let me know if you experience any problems still.

pentschev mentioned this issue Sep 10, 2021

Check if CUDA context was created in distributed.comm.ucx #722

Merged

rapids-bot bot closed this as completed in #722 Sep 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Warnings about existing CUDA contexts on dask-cuda cluster startup #721

Warnings about existing CUDA contexts on dask-cuda cluster startup #721

randerzander commented Sep 10, 2021 •

edited

Loading

pentschev commented Sep 10, 2021

pentschev commented Sep 10, 2021

pentschev commented Sep 14, 2021

Warnings about existing CUDA contexts on dask-cuda cluster startup #721

Warnings about existing CUDA contexts on dask-cuda cluster startup #721

Comments

randerzander commented Sep 10, 2021 • edited Loading

pentschev commented Sep 10, 2021

pentschev commented Sep 10, 2021

pentschev commented Sep 14, 2021

randerzander commented Sep 10, 2021 •

edited

Loading