You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With the advent of NVLink, getting the correct node topology is now more complicated than just ensuring correct NUMA placement of processes for their respective GPUs. For dense GPU systems with multiple NVLinks we need a method to easily map the local process id to GPU device ordinal. Failing to do so will result in sub-optimal use of peer-to-peer bandwidth and message being unnecessarily routed through CPU memory.
The text was updated successfully, but these errors were encountered:
Closing this issue, as this required functionality is trivially obtained from the CUDA_VISIBLE_DEVICES environment variable (the order that devices are listed in this variable corresponds to the order that the CUDA runtime sees the device).
With the advent of NVLink, getting the correct node topology is now more complicated than just ensuring correct NUMA placement of processes for their respective GPUs. For dense GPU systems with multiple NVLinks we need a method to easily map the local process id to GPU device ordinal. Failing to do so will result in sub-optimal use of peer-to-peer bandwidth and message being unnecessarily routed through CPU memory.
The text was updated successfully, but these errors were encountered: