From 45b592d7dc575d0c02d6f49b8fa071bd0c853189 Mon Sep 17 00:00:00 2001 From: Peter Andreas Entschev Date: Thu, 9 Sep 2021 09:48:53 -0700 Subject: [PATCH 1/3] Update Distributed UCX variable names in docs --- docs/source/ucx.rst | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/docs/source/ucx.rst b/docs/source/ucx.rst index 1bc262b93..4246f541a 100644 --- a/docs/source/ucx.rst +++ b/docs/source/ucx.rst @@ -27,30 +27,30 @@ In addition to installations of UCX and UCX-Py on your system, several options m Typically, these will affect ``UCX_TLS`` and ``UCX_SOCKADDR_TLS_PRIORITY``, environment variables used by UCX to decide what transport methods to use and which to prioritize, respectively. However, some will affect related libraries, such as RMM: -- ``ucx.cuda_copy: true`` -- **required.** +- ``distributed.comm.ucx.cuda_copy: true`` -- **required.** Adds ``cuda_copy`` to ``UCX_TLS``, enabling CUDA transfers over UCX. -- ``ucx.tcp: true`` -- **required.** +- ``distributed.comm.ucx.tcp: true`` -- **required.** Adds ``tcp`` to ``UCX_TLS``, enabling TCP transfers over UCX; this is required for very small transfers which are inefficient for NVLink and InfiniBand. -- ``ucx.nvlink: true`` -- **required for NVLink.** +- ``distributed.comm.ucx.nvlink: true`` -- **required for NVLink.** Adds ``cuda_ipc`` to ``UCX_TLS``, enabling NVLink transfers over UCX; affects intra-node communication only. -- ``ucx.infiniband: true`` -- **required for InfiniBand.** +- ``distributed.comm.ucx.infiniband: true`` -- **required for InfiniBand.** Adds ``rc`` to ``UCX_TLS``, enabling InfiniBand transfers over UCX. For optimal performance with UCX 1.11 and above, it is recommended to also set the environment variables ``UCX_MAX_RNDV_RAILS=1`` and ``UCX_MEMTYPE_REG_WHOLE_ALLOC_TYPES=cuda``, see documentation `here `_ and `here `_ for more details on those variables. -- ``ucx.rdmacm: true`` -- **recommended for InfiniBand.** +- ``distributed.comm.ucx.rdmacm: true`` -- **recommended for InfiniBand.** Replaces ``sockcm`` with ``rdmacm`` in ``UCX_SOCKADDR_TLS_PRIORITY``, enabling remote direct memory access (RDMA) for InfiniBand transfers. This is recommended by UCX for use with InfiniBand, and will not work if InfiniBand is disabled. -- ``ucx.net-devices: `` -- **recommended for UCX 1.9 and older.** +- ``distributed.comm.ucx.net-devices: `` -- **recommended for UCX 1.9 and older.** Explicitly sets ``UCX_NET_DEVICES`` instead of defaulting to ``"all"``, which can result in suboptimal performance. If using InfiniBand, set to ``"auto"`` to automatically detect the InfiniBand interface closest to each GPU on UCX 1.9 and below. @@ -65,14 +65,14 @@ However, some will affect related libraries, such as RMM: -- ``rmm.pool-size: `` -- **recommended.** +- ``distributed.rmm.pool-size: `` -- **recommended.** Allocates an RMM pool of the specified size for the process; size can be provided with an integer number of bytes or in human readable format, e.g. ``"4GB"``. It is recommended to set the pool size to at least the minimum amount of memory used by the process; if possible, one can map all GPU memory to a single pool, to be utilized for the lifetime of the process. .. note:: These options can be used with mainline Dask.distributed. - However, some features are exclusive to Dask-CUDA, such as the automatic detection of InfiniBand interfaces. + However, some features are exclusive to Dask-CUDA, such as the automatic detection of InfiniBand interfaces. See `Dask-CUDA -- Motivation `_ for more details on the benefits of using Dask-CUDA. Usage From 3b857615d453644bc964127835ef4630a629c4fd Mon Sep 17 00:00:00 2001 From: Peter Andreas Entschev Date: Thu, 9 Sep 2021 10:11:26 -0700 Subject: [PATCH 2/3] Update examples for UCX 1.11 --- docs/source/examples/ucx.rst | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/docs/source/examples/ucx.rst b/docs/source/examples/ucx.rst index 44b4c5f73..a2b5cffb2 100644 --- a/docs/source/examples/ucx.rst +++ b/docs/source/examples/ucx.rst @@ -22,11 +22,13 @@ To connect a client to a cluster with all supported transports and an RMM pool: enable_nvlink=True, enable_infiniband=True, enable_rdmacm=True, - ucx_net_devices="auto", rmm_pool_size="1GB" ) client = Client(cluster) +.. note:: + For UCX 1.9 (deprecated) and older, it's necessary to pass ``ucx_net_devices="auto"`` to ``LocalCUDACluster``. UCX 1.11 and above is capable of selecting InfiniBand devices automatically. + dask-cuda-worker ---------------- @@ -46,13 +48,14 @@ To start a Dask scheduler using UCX with all supported transports and an gigabyt > DASK_DISTRIBUTED__COMM__UCX__NVLINK=True \ > DASK_DISTRIBUTED__COMM__UCX__INFINIBAND=True \ > DASK_DISTRIBUTED__COMM__UCX__RDMACM=True \ - > DASK_DISTRIBUTED__COMM__UCX__NET_DEVICES=mlx5_0:1 \ > DASK_DISTRIBUTED__RMM__POOL_SIZE=1GB \ > dask-scheduler --protocol ucx --interface ib0 -Note the specification of ``"mlx5_0:1"`` as our UCX net device; because the scheduler does not rely upon Dask-CUDA, it cannot automatically detect InfiniBand interfaces, so we must specify one explicitly. We communicate to the scheduler that we will be using UCX with the ``--protocol`` option, and that we will be using InfiniBand with the ``--interface`` option. +.. note:: + For UCX 1.9 (deprecated) and older it's also necessary to set ``DASK_DISTRIBUTED__COMM__UCX__NET_DEVICES=mlx5_0:1``, where ``"mlx5_0:1"`` is our UCX net device; because the scheduler does not rely upon Dask-CUDA, it cannot automatically detect InfiniBand interfaces, so we must specify one explicitly. UCX 1.11 and above is capable of selecting InfiniBand devices automatically. + Workers ^^^^^^^ @@ -69,6 +72,9 @@ To start a cluster with all supported transports and an RMM pool: > --net-devices="auto" \ > --rmm-pool-size="1GB" +.. note:: + For UCX 1.9 (deprecated) and older it's also necessary to set ``--net-devices="auto"``. UCX 1.11 and above is capable of selecting InfiniBand devices automatically. + Client ^^^^^^ @@ -85,8 +91,8 @@ To connect a client to the cluster we have made: enable_nvlink=True, enable_infiniband=True, enable_rdmacm=True, - net_devices="mlx5_0:1", ) client = Client("ucx://:8786") -Note again the specification of ``"mlx5_0:1"`` as our UCX net device, due to the fact that the client does not support automatic detection of InfiniBand interfaces. +.. note:: + For UCX 1.9 (deprecated) and older it's also necessary to set ``net_devices="mlx5_0:1"``, where ``"mlx5_0:1"`` is our UCX net device; because the client does not rely upon Dask-CUDA, it cannot automatically detect InfiniBand interfaces, so we must specify one explicitly. UCX 1.11 and above is capable of selecting InfiniBand devices automatically. From e2b3e04cdf39f9405e221dbdb409a2dff41511bc Mon Sep 17 00:00:00 2001 From: Peter Andreas Entschev Date: Thu, 9 Sep 2021 10:26:33 -0700 Subject: [PATCH 3/3] Remove remaining `--net-devices` argument from UCX example --- docs/source/examples/ucx.rst | 1 - 1 file changed, 1 deletion(-) diff --git a/docs/source/examples/ucx.rst b/docs/source/examples/ucx.rst index a2b5cffb2..036b99291 100644 --- a/docs/source/examples/ucx.rst +++ b/docs/source/examples/ucx.rst @@ -69,7 +69,6 @@ To start a cluster with all supported transports and an RMM pool: > --enable-nvlink \ > --enable-infiniband \ > --enable-rdmacm \ - > --net-devices="auto" \ > --rmm-pool-size="1GB" .. note::