Skip to content

Commit

Permalink
Disable reuse endpoints with UCX >= 1.11 (#620)
Browse files Browse the repository at this point in the history
The UCX-Py endpoint reuse is not anymore necessary, so we also disable that for UCX 1.11+. The primary reason it was introduced was to circumvent an issue with CUDA IPC that was resolved by openucx/ucx#6360. Using the endpoint reuse class has also proven to be very slow, taking a long time to initialize for clusters with just a few dozen workers and pretty much unusable for a cluster in the order of 100 workers.

Authors:
  - Peter Andreas Entschev (https://github.com/pentschev)

Approvers:
  - Benjamin Zaitlen (https://github.com/quasiben)

URL: #620
  • Loading branch information
pentschev authored May 21, 2021
1 parent c158d9c commit ab1d35c
Showing 1 changed file with 3 additions and 1 deletion.
4 changes: 3 additions & 1 deletion dask_cuda/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,10 @@ def nvtx_annotate(message=None, color="blue", domain=None):
import ucp

_ucx_110 = ucp.get_ucx_version() >= (1, 10, 0)
_ucx_111 = ucp.get_ucx_version() >= (1, 11, 0)
except ImportError:
_ucx_110 = False
_ucx_111 = False


class CPUAffinity:
Expand Down Expand Up @@ -247,7 +249,7 @@ def get_ucx_config(
"rdmacm": None,
"net-devices": None,
"cuda_copy": None,
"reuse-endpoints": True,
"reuse-endpoints": not _ucx_111,
}
if enable_tcp_over_ucx or enable_infiniband or enable_nvlink:
ucx_config["cuda_copy"] = True
Expand Down

0 comments on commit ab1d35c

Please sign in to comment.