Skip to content

Commit

Permalink
Merge e2b3e04 into 33e5d3e
Browse files Browse the repository at this point in the history
  • Loading branch information
pentschev authored Sep 9, 2021
2 parents 33e5d3e + e2b3e04 commit b1920d5
Show file tree
Hide file tree
Showing 2 changed files with 19 additions and 14 deletions.
17 changes: 11 additions & 6 deletions docs/source/examples/ucx.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,13 @@ To connect a client to a cluster with all supported transports and an RMM pool:
enable_nvlink=True,
enable_infiniband=True,
enable_rdmacm=True,
ucx_net_devices="auto",
rmm_pool_size="1GB"
)
client = Client(cluster)
.. note::
For UCX 1.9 (deprecated) and older, it's necessary to pass ``ucx_net_devices="auto"`` to ``LocalCUDACluster``. UCX 1.11 and above is capable of selecting InfiniBand devices automatically.

dask-cuda-worker
----------------

Expand All @@ -46,13 +48,14 @@ To start a Dask scheduler using UCX with all supported transports and an gigabyt
> DASK_DISTRIBUTED__COMM__UCX__NVLINK=True \
> DASK_DISTRIBUTED__COMM__UCX__INFINIBAND=True \
> DASK_DISTRIBUTED__COMM__UCX__RDMACM=True \
> DASK_DISTRIBUTED__COMM__UCX__NET_DEVICES=mlx5_0:1 \
> DASK_DISTRIBUTED__RMM__POOL_SIZE=1GB \
> dask-scheduler --protocol ucx --interface ib0
Note the specification of ``"mlx5_0:1"`` as our UCX net device; because the scheduler does not rely upon Dask-CUDA, it cannot automatically detect InfiniBand interfaces, so we must specify one explicitly.
We communicate to the scheduler that we will be using UCX with the ``--protocol`` option, and that we will be using InfiniBand with the ``--interface`` option.

.. note::
For UCX 1.9 (deprecated) and older it's also necessary to set ``DASK_DISTRIBUTED__COMM__UCX__NET_DEVICES=mlx5_0:1``, where ``"mlx5_0:1"`` is our UCX net device; because the scheduler does not rely upon Dask-CUDA, it cannot automatically detect InfiniBand interfaces, so we must specify one explicitly. UCX 1.11 and above is capable of selecting InfiniBand devices automatically.

Workers
^^^^^^^

Expand All @@ -66,9 +69,11 @@ To start a cluster with all supported transports and an RMM pool:
> --enable-nvlink \
> --enable-infiniband \
> --enable-rdmacm \
> --net-devices="auto" \
> --rmm-pool-size="1GB"
.. note::
For UCX 1.9 (deprecated) and older it's also necessary to set ``--net-devices="auto"``. UCX 1.11 and above is capable of selecting InfiniBand devices automatically.

Client
^^^^^^

Expand All @@ -85,8 +90,8 @@ To connect a client to the cluster we have made:
enable_nvlink=True,
enable_infiniband=True,
enable_rdmacm=True,
net_devices="mlx5_0:1",
)
client = Client("ucx://<scheduler_address>:8786")
Note again the specification of ``"mlx5_0:1"`` as our UCX net device, due to the fact that the client does not support automatic detection of InfiniBand interfaces.
.. note::
For UCX 1.9 (deprecated) and older it's also necessary to set ``net_devices="mlx5_0:1"``, where ``"mlx5_0:1"`` is our UCX net device; because the client does not rely upon Dask-CUDA, it cannot automatically detect InfiniBand interfaces, so we must specify one explicitly. UCX 1.11 and above is capable of selecting InfiniBand devices automatically.
16 changes: 8 additions & 8 deletions docs/source/ucx.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,30 +27,30 @@ In addition to installations of UCX and UCX-Py on your system, several options m
Typically, these will affect ``UCX_TLS`` and ``UCX_SOCKADDR_TLS_PRIORITY``, environment variables used by UCX to decide what transport methods to use and which to prioritize, respectively.
However, some will affect related libraries, such as RMM:

- ``ucx.cuda_copy: true`` -- **required.**
- ``distributed.comm.ucx.cuda_copy: true`` -- **required.**

Adds ``cuda_copy`` to ``UCX_TLS``, enabling CUDA transfers over UCX.

- ``ucx.tcp: true`` -- **required.**
- ``distributed.comm.ucx.tcp: true`` -- **required.**

Adds ``tcp`` to ``UCX_TLS``, enabling TCP transfers over UCX; this is required for very small transfers which are inefficient for NVLink and InfiniBand.

- ``ucx.nvlink: true`` -- **required for NVLink.**
- ``distributed.comm.ucx.nvlink: true`` -- **required for NVLink.**

Adds ``cuda_ipc`` to ``UCX_TLS``, enabling NVLink transfers over UCX; affects intra-node communication only.

- ``ucx.infiniband: true`` -- **required for InfiniBand.**
- ``distributed.comm.ucx.infiniband: true`` -- **required for InfiniBand.**

Adds ``rc`` to ``UCX_TLS``, enabling InfiniBand transfers over UCX.

For optimal performance with UCX 1.11 and above, it is recommended to also set the environment variables ``UCX_MAX_RNDV_RAILS=1`` and ``UCX_MEMTYPE_REG_WHOLE_ALLOC_TYPES=cuda``, see documentation `here <https://ucx-py.readthedocs.io/en/latest/configuration.html#ucx-max-rndv-rails>`_ and `here <https://ucx-py.readthedocs.io/en/latest/configuration.html#ucx-memtype-reg-whole-alloc-types>`_ for more details on those variables.

- ``ucx.rdmacm: true`` -- **recommended for InfiniBand.**
- ``distributed.comm.ucx.rdmacm: true`` -- **recommended for InfiniBand.**

Replaces ``sockcm`` with ``rdmacm`` in ``UCX_SOCKADDR_TLS_PRIORITY``, enabling remote direct memory access (RDMA) for InfiniBand transfers.
This is recommended by UCX for use with InfiniBand, and will not work if InfiniBand is disabled.

- ``ucx.net-devices: <str>`` -- **recommended for UCX 1.9 and older.**
- ``distributed.comm.ucx.net-devices: <str>`` -- **recommended for UCX 1.9 and older.**

Explicitly sets ``UCX_NET_DEVICES`` instead of defaulting to ``"all"``, which can result in suboptimal performance.
If using InfiniBand, set to ``"auto"`` to automatically detect the InfiniBand interface closest to each GPU on UCX 1.9 and below.
Expand All @@ -65,14 +65,14 @@ However, some will affect related libraries, such as RMM:



- ``rmm.pool-size: <str|int>`` -- **recommended.**
- ``distributed.rmm.pool-size: <str|int>`` -- **recommended.**

Allocates an RMM pool of the specified size for the process; size can be provided with an integer number of bytes or in human readable format, e.g. ``"4GB"``.
It is recommended to set the pool size to at least the minimum amount of memory used by the process; if possible, one can map all GPU memory to a single pool, to be utilized for the lifetime of the process.

.. note::
These options can be used with mainline Dask.distributed.
However, some features are exclusive to Dask-CUDA, such as the automatic detection of InfiniBand interfaces.
However, some features are exclusive to Dask-CUDA, such as the automatic detection of InfiniBand interfaces.
See `Dask-CUDA -- Motivation <index.html#motivation>`_ for more details on the benefits of using Dask-CUDA.

Usage
Expand Down

0 comments on commit b1920d5

Please sign in to comment.