You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As far as I can tell, the problem is that #98 added the following logic to detect GPU availability: HAS_GPU = len(cuda.gpus.lst) > 0. This logic works just fine within a local process, but breaks Dask-CUDA device pinning when it is included in a top-level import (or is performed in the global context of the program). In other words, code like this shouldn't be executed by an import statement, like from merlin.core.compat import HAS_GPU.
The problem becomes apparent in a simple (Merlin-free) reproducer:
# reproducer.pyfromdask_cudaimportLocalCUDAClusterfromnumbaimportcuda# This is fineHAS_GPU=len(cuda.gpus.lst) >0# This is not fineif__name__=="__main__":
cluster=LocalCUDACluster()
If you execute python ./reproducer.py, you sill see warnings like:
/.../distributed/distributed/comm/ucx.py:67: UserWarning: Worker with process ID 49507 should have a CUDA context assigned to device 1, but instead the CUDA context is on device 0. This is often the result of a CUDA-enabled library calling a CUDA runtime function before Dask-CUDA can spawn worker processes. Please make sure any such function calls don't happen at import time or in the global scope of a program.
The text was updated successfully, but these errors were encountered:
While attempting to benchmark NVIDIA-Merlin/NVTabular#1687, I discovered that the dask-criteo benchmark does not work with the latest version of NVTabular/Merlin-core.
As far as I can tell, the problem is that #98 added the following logic to detect GPU availability:
HAS_GPU = len(cuda.gpus.lst) > 0
. This logic works just fine within a local process, but breaks Dask-CUDA device pinning when it is included in a top-level import (or is performed in the global context of the program). In other words, code like this shouldn't be executed by an import statement, likefrom merlin.core.compat import HAS_GPU
.The problem becomes apparent in a simple (Merlin-free) reproducer:
If you execute
python ./reproducer.py
, you sill see warnings like:The text was updated successfully, but these errors were encountered: