Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add descriptions for UCX config options #4683

Merged
merged 6 commits into from
Apr 20, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 35 additions & 28 deletions distributed/distributed-schema.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -803,44 +803,51 @@ properties:
rmm:
type: object
description: |
Configuration options for the RAPIDS Memory Manager
Configuration options for the RAPIDS Memory Manager.
properties:
pool-size:
type:
- integer
- "null"
description:
The size of the memory pool in bytes
type: [integer, 'null']
description: |
The size of the memory pool in bytes.
ucx:
type: object
description: |
UCX provides access to other network interconnects like Infiniband and NVLINK
UCX provides access to other transport methods including NVLink and InfiniBand.
properties:
cuda_copy:
type: boolean
description: |
Set environment variables to enable CUDA support over UCX. This may be used even if
InfiniBand and NVLink are not supported or disabled, then transferring data over TCP.
tcp:
type:
- boolean
- "null"
type: boolean
description: |
Set environment variables to enable TCP over UCX, even if InfiniBand and NVLink
are not supported or disabled.
nvlink:
type:
- boolean
- "null"
type: boolean
description: |
Set environment variables to enable UCX over NVLink, implies ``ucx.tcp=True``.
infiniband:
type:
- boolean
- "null"
type: boolean
description: |
Set environment variables to enable UCX over InfiniBand, implies ``ucx.tcp=True``.
rdmacm:
type:
- boolean
- "null"
cuda_copy:
type:
- boolean
- "null"
type: boolean
description: |
Set environment variables to enable UCX RDMA connection manager support,
requires ``ucx.infiniband=True``.
net-devices:
type:
- string
- "null"
description: Define which Infiniband device to use
type: [string, 'null']
description: |
Interface(s) used by workers for UCX communication. Can be a string (like
``"eth0"`` for NVLink or ``"mlx5_0:1"``/``"ib0"`` for InfiniBand), ``"auto"``
(requires ``ucx.infiniband=True``) to pick the optimal interface per-worker based on
the system's topology, or ``None`` to stay with the default value of ``"all"`` (use
all available interfaces). Setting to ``"auto"`` requires UCX-Py to be installed
and compiled with hwloc support. Unexpected errors can occur when using
``"auto"`` if any interfaces are disconnected or improperly configured.
reuse-endpoints:
type: boolean
description: Whether to reuse endpoints or not, default True
description: |
Whether to reuse endpoints or not.
12 changes: 6 additions & 6 deletions distributed/distributed.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -188,10 +188,10 @@ distributed:
rmm:
pool-size: null
ucx:
tcp: null # enable tcp
nvlink: null # enable cuda_ipc
infiniband: null # enable Infiniband
rdmacm: null # enable RDMACM
cuda_copy: null # enable cuda-copy
net-devices: null # define which Infiniband device to use
cuda_copy: False # enable cuda-copy
tcp: False # enable tcp
nvlink: False # enable cuda_ipc
infiniband: False # enable Infiniband
rdmacm: False # enable RDMACM
net-devices: null # define what interface to use for UCX comm
reuse-endpoints: True # enable endpoint reuse