-
Notifications
You must be signed in to change notification settings - Fork 94
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix NVML index usage in CUDAWorker/LocalCUDACluster (#671)
Fix an issue where device index `0` would be used in NVML functions. For CUDA runtime calls, we expect that the GPU being targeted is always on index `0` as its relative to the `CUDA_VISIBLE_DEVICES` ordering. However, NVML relies on absolute indices, thus we have to always use the actual GPU index being targeted, rather than the first one in `CUDA_VISIBLE_DEVICES`. This is normally not an issue if no `CUDA_VISIBLE_DEVICES` is set, or is just set as `list(",".join(list(str(i) for i in range(get_n_gpus())))`, but it may be an issue when targeting a different list of GPUs. For example on a DGX-1, CPU affinity for GPUs `0-3` is `0-19,40-59`, and for GPUs `4-7` it is `20-39,60-79`, but when the user would set `CUDA_VISIBLE_DEVICES=4` the CPU affinity for the targeted GPU would be the same as for device index `0`, or `0-19,40-59`. This could result in lower performance, as well as wrong computation of total GPU memory for non-homogenous GPU systems. Authors: - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - Benjamin Zaitlen (https://github.com/quasiben) URL: #671
- Loading branch information
Showing
4 changed files
with
54 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters