Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update more docs for UCX 1.11+ #720

Merged
merged 3 commits into from
Sep 9, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 11 additions & 6 deletions docs/source/examples/ucx.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,13 @@ To connect a client to a cluster with all supported transports and an RMM pool:
enable_nvlink=True,
enable_infiniband=True,
enable_rdmacm=True,
ucx_net_devices="auto",
rmm_pool_size="1GB"
)
client = Client(cluster)

.. note::
For UCX 1.9 (deprecated) and older, it's necessary to pass ``ucx_net_devices="auto"`` to ``LocalCUDACluster``. UCX 1.11 and above is capable of selecting InfiniBand devices automatically.

dask-cuda-worker
----------------

Expand All @@ -46,13 +48,14 @@ To start a Dask scheduler using UCX with all supported transports and an gigabyt
> DASK_DISTRIBUTED__COMM__UCX__NVLINK=True \
> DASK_DISTRIBUTED__COMM__UCX__INFINIBAND=True \
> DASK_DISTRIBUTED__COMM__UCX__RDMACM=True \
> DASK_DISTRIBUTED__COMM__UCX__NET_DEVICES=mlx5_0:1 \
> DASK_DISTRIBUTED__RMM__POOL_SIZE=1GB \
> dask-scheduler --protocol ucx --interface ib0

Note the specification of ``"mlx5_0:1"`` as our UCX net device; because the scheduler does not rely upon Dask-CUDA, it cannot automatically detect InfiniBand interfaces, so we must specify one explicitly.
We communicate to the scheduler that we will be using UCX with the ``--protocol`` option, and that we will be using InfiniBand with the ``--interface`` option.

.. note::
For UCX 1.9 (deprecated) and older it's also necessary to set ``DASK_DISTRIBUTED__COMM__UCX__NET_DEVICES=mlx5_0:1``, where ``"mlx5_0:1"`` is our UCX net device; because the scheduler does not rely upon Dask-CUDA, it cannot automatically detect InfiniBand interfaces, so we must specify one explicitly. UCX 1.11 and above is capable of selecting InfiniBand devices automatically.

Workers
^^^^^^^

Expand All @@ -66,9 +69,11 @@ To start a cluster with all supported transports and an RMM pool:
> --enable-nvlink \
> --enable-infiniband \
> --enable-rdmacm \
> --net-devices="auto" \
> --rmm-pool-size="1GB"

.. note::
For UCX 1.9 (deprecated) and older it's also necessary to set ``--net-devices="auto"``. UCX 1.11 and above is capable of selecting InfiniBand devices automatically.
charlesbluca marked this conversation as resolved.
Show resolved Hide resolved

Client
^^^^^^

Expand All @@ -85,8 +90,8 @@ To connect a client to the cluster we have made:
enable_nvlink=True,
enable_infiniband=True,
enable_rdmacm=True,
net_devices="mlx5_0:1",
)
client = Client("ucx://<scheduler_address>:8786")

Note again the specification of ``"mlx5_0:1"`` as our UCX net device, due to the fact that the client does not support automatic detection of InfiniBand interfaces.
.. note::
For UCX 1.9 (deprecated) and older it's also necessary to set ``net_devices="mlx5_0:1"``, where ``"mlx5_0:1"`` is our UCX net device; because the client does not rely upon Dask-CUDA, it cannot automatically detect InfiniBand interfaces, so we must specify one explicitly. UCX 1.11 and above is capable of selecting InfiniBand devices automatically.
16 changes: 8 additions & 8 deletions docs/source/ucx.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,30 +27,30 @@ In addition to installations of UCX and UCX-Py on your system, several options m
Typically, these will affect ``UCX_TLS`` and ``UCX_SOCKADDR_TLS_PRIORITY``, environment variables used by UCX to decide what transport methods to use and which to prioritize, respectively.
However, some will affect related libraries, such as RMM:

- ``ucx.cuda_copy: true`` -- **required.**
- ``distributed.comm.ucx.cuda_copy: true`` -- **required.**

Adds ``cuda_copy`` to ``UCX_TLS``, enabling CUDA transfers over UCX.

- ``ucx.tcp: true`` -- **required.**
- ``distributed.comm.ucx.tcp: true`` -- **required.**

Adds ``tcp`` to ``UCX_TLS``, enabling TCP transfers over UCX; this is required for very small transfers which are inefficient for NVLink and InfiniBand.

- ``ucx.nvlink: true`` -- **required for NVLink.**
- ``distributed.comm.ucx.nvlink: true`` -- **required for NVLink.**

Adds ``cuda_ipc`` to ``UCX_TLS``, enabling NVLink transfers over UCX; affects intra-node communication only.

- ``ucx.infiniband: true`` -- **required for InfiniBand.**
- ``distributed.comm.ucx.infiniband: true`` -- **required for InfiniBand.**

Adds ``rc`` to ``UCX_TLS``, enabling InfiniBand transfers over UCX.

For optimal performance with UCX 1.11 and above, it is recommended to also set the environment variables ``UCX_MAX_RNDV_RAILS=1`` and ``UCX_MEMTYPE_REG_WHOLE_ALLOC_TYPES=cuda``, see documentation `here <https://ucx-py.readthedocs.io/en/latest/configuration.html#ucx-max-rndv-rails>`_ and `here <https://ucx-py.readthedocs.io/en/latest/configuration.html#ucx-memtype-reg-whole-alloc-types>`_ for more details on those variables.

- ``ucx.rdmacm: true`` -- **recommended for InfiniBand.**
- ``distributed.comm.ucx.rdmacm: true`` -- **recommended for InfiniBand.**

Replaces ``sockcm`` with ``rdmacm`` in ``UCX_SOCKADDR_TLS_PRIORITY``, enabling remote direct memory access (RDMA) for InfiniBand transfers.
This is recommended by UCX for use with InfiniBand, and will not work if InfiniBand is disabled.

- ``ucx.net-devices: <str>`` -- **recommended for UCX 1.9 and older.**
- ``distributed.comm.ucx.net-devices: <str>`` -- **recommended for UCX 1.9 and older.**

Explicitly sets ``UCX_NET_DEVICES`` instead of defaulting to ``"all"``, which can result in suboptimal performance.
If using InfiniBand, set to ``"auto"`` to automatically detect the InfiniBand interface closest to each GPU on UCX 1.9 and below.
Expand All @@ -65,14 +65,14 @@ However, some will affect related libraries, such as RMM:



- ``rmm.pool-size: <str|int>`` -- **recommended.**
- ``distributed.rmm.pool-size: <str|int>`` -- **recommended.**

Allocates an RMM pool of the specified size for the process; size can be provided with an integer number of bytes or in human readable format, e.g. ``"4GB"``.
It is recommended to set the pool size to at least the minimum amount of memory used by the process; if possible, one can map all GPU memory to a single pool, to be utilized for the lifetime of the process.

.. note::
These options can be used with mainline Dask.distributed.
However, some features are exclusive to Dask-CUDA, such as the automatic detection of InfiniBand interfaces.
However, some features are exclusive to Dask-CUDA, such as the automatic detection of InfiniBand interfaces.
See `Dask-CUDA -- Motivation <index.html#motivation>`_ for more details on the benefits of using Dask-CUDA.

Usage
Expand Down