Skip to content

Commit

Permalink
Add distributed-ucxx subproject (#60)
Browse files Browse the repository at this point in the history
Add new subproject `distributed-ucxx`, providing a plugin for Distributed with a new `protocol="ucxx"` that can be specified by the user to enable UCXX as backend for communication. This is completely independent of UCX-Py for now, which may still be chosen by specifying `protocol="ucx"`.

Most of the changes here are actually reimplementing the [UCX-Py backend from Distributed](https://github.com/dask/distributed/blob/main/distributed/comm/ucx.py), with minor changes such as `ucp`->`ucxx` and to adapt to API changes in UCXX. The tests in this PR are also those that currently test [UCX-Py in Distributed](https://github.com/dask/distributed/tree/main/distributed/comm/tests), similarly with `ucp`->`ucxx` and API adaptations.

Packaging and distribution may still require further work that will be addressed in follow-up PRs.

Authors:
  - Peter Andreas Entschev (https://github.com/pentschev)

Approvers:
  - Lawrence Mitchell (https://github.com/wence-)
  - Ray Douglass (https://github.com/raydouglass)

URL: #60
  • Loading branch information
pentschev authored Oct 30, 2023
1 parent 08905c9 commit 7af9f19
Show file tree
Hide file tree
Showing 20 changed files with 2,723 additions and 4 deletions.
17 changes: 14 additions & 3 deletions build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,13 +18,14 @@ ARGS=$*
# script, and that this script resides in the repo dir!
REPODIR=$(cd $(dirname $0); pwd)

VALIDARGS="clean libucxx libucxx_python ucxx benchmarks tests examples -v -g -n -c --show_depr_warn -h"
HELP="$0 [clean] [libucxx] [libucxx_python] [ucxx] [benchmarks] [tests] [examples] [-vcgnh] [--cmake-args=\\\"<args>\\\"]
VALIDARGS="clean libucxx libucxx_python ucxx distributed_ucxx benchmarks tests examples -v -g -n -c --show_depr_warn -h"
HELP="$0 [clean] [libucxx] [libucxx_python] [ucxx] [distributed_ucxx] [benchmarks] [tests] [examples] [-vcgnh] [--cmake-args=\\\"<args>\\\"]
clean - remove all existing build artifacts and configuration (start
over)
libucxx - build the UCXX C++ module
libucxx_python - build the UCXX C++ Python support module
ucxx - build the ucxx Python package
distributed_ucxx - build the distributed_ucxx (Dask Distributed module) Python package
benchmarks - build benchmarks
tests - build tests
examples - build examples
Expand All @@ -36,7 +37,7 @@ HELP="$0 [clean] [libucxx] [libucxx_python] [ucxx] [benchmarks] [tests] [example
--cmake-args=\\\"<args>\\\" - pass arbitrary list of CMake configuration options (escape all quotes in argument)
-h | --h[elp] - print this text
default action (no args) is to build and install 'libucxx' and 'libucxx_python', and then 'ucxx' targets
default action (no args) is to build and install 'libucxx' and 'libucxx_python', then 'ucxx' targets, and finally 'distributed_ucxx'
"
LIB_BUILD_DIR=${LIB_BUILD_DIR:=${REPODIR}/cpp/build}
UCXX_BUILD_DIR=${REPODIR}/python/build
Expand Down Expand Up @@ -223,3 +224,13 @@ if buildAll || hasArg ucxx; then
python setup.py install --single-version-externally-managed --record=record.txt -- -DCMAKE_PREFIX_PATH=${INSTALL_PREFIX} -DCMAKE_BUILD_TYPE=${BUILD_TYPE} -DCMAKE_LIBRARY_PATH=${LIBUCXX_BUILD_DIR} ${EXTRA_CMAKE_ARGS} -- -j${PARALLEL_LEVEL:-1}
fi
fi

# Build and install the distributed_ucxx Python package
if buildAll || hasArg distributed_ucxx; then

cd ${REPODIR}/python/distributed-ucxx/
python setup.py build_ext --inplace
if [[ ${INSTALL_TARGET} != "" ]]; then
python setup.py install --single-version-externally-managed --record=record.txt
fi
fi
30 changes: 29 additions & 1 deletion ci/test_python.sh
Original file line number Diff line number Diff line change
Expand Up @@ -72,12 +72,32 @@ run_py_benchmark() {
UCX_KEEPALIVE_INTERVAL=1ms UCXPY_ENABLE_DELAYED_SUBMISSION=${ENABLE_DELAYED_SUBMISSION} UCXPY_ENABLE_PYTHON_FUTURE=${ENABLE_PYTHON_FUTURE} timeout 2m python -m ucxx.benchmarks.send_recv --backend ${BACKEND} -o cupy --reuse-alloc -n 8MiB --n-buffers $N_BUFFERS --progress-mode ${PROGRESS_MODE} ${ASYNCIO_WAIT}
}

run_distributed_ucxx_tests() {
PROGRESS_MODE=$1
ENABLE_DELAYED_SUBMISSION=$2
ENABLE_PYTHON_FUTURE=$3

CMD_LINE="UCXPY_PROGRESS_MODE=${PROGRESS_MODE} UCXPY_ENABLE_DELAYED_SUBMISSION=${ENABLE_DELAYED_SUBMISSION} UCXPY_ENABLE_PYTHON_FUTURE=${ENABLE_PYTHON_FUTURE} timeout 10m pytest -vs python/distributed-ucxx/distributed_ucxx/tests/"

# Workaround for https://github.com/rapidsai/ucxx/issues/15
# CMD_LINE="UCX_KEEPALIVE_INTERVAL=1ms ${CMD_LINE}"

log_command "${CMD_LINE}"
UCXPY_PROGRESS_MODE=${PROGRESS_MODE} UCXPY_ENABLE_DELAYED_SUBMISSION=${ENABLE_DELAYED_SUBMISSION} UCXPY_ENABLE_PYTHON_FUTURE=${ENABLE_PYTHON_FUTURE} timeout 10m pytest -vs python/distributed-ucxx/distributed_ucxx/tests/
}

rapids-logger "Downloading artifacts from previous jobs"
CPP_CHANNEL=$(rapids-download-conda-from-s3 cpp)

rapids-mamba-retry install \
--channel "${CPP_CHANNEL}" \
libucxx ucxx
libucxx ucxx distributed-ucxx

# TODO: Perhaps install from conda? We need distributed installed in developer
# mode to provide test utils, but that's probably not doable from conda packages.
rapids-logger "Install Distributed in developer mode"
git clone https://github.com/dask/distributed /tmp/distributed
pip install -e /tmp/distributed

print_ucx_config

Expand All @@ -104,3 +124,11 @@ for nbuf in 1 8; do
run_py_benchmark ucxx-async thread 0 1 1 ${nbuf} 0
fi
done

rapids-logger "Distributed Tests"
# run_distributed_ucxx_tests PROGRESS_MODE ENABLE_DELAYED_SUBMISSION ENABLE_PYTHON_FUTURE
run_distributed_ucxx_tests polling 0 0
run_distributed_ucxx_tests thread 0 0
run_distributed_ucxx_tests thread 0 1
run_distributed_ucxx_tests thread 1 0
run_distributed_ucxx_tests thread 1 1
6 changes: 6 additions & 0 deletions conda/recipes/ucxx/build_and_install_distributed_ucxx.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#!/bin/bash

# SPDX-FileCopyrightText: Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES.
# SPDX-License-Identifier: BSD-3-Clause

./build.sh distributed_ucxx
29 changes: 29 additions & 0 deletions conda/recipes/ucxx/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -244,9 +244,38 @@ outputs:
- test -f $PREFIX/include/ucxx/python/notifier.h
- test -f $PREFIX/include/ucxx/python/python_future.h
- test -f $PREFIX/include/ucxx/python/worker.h
imports:
- ucxx
about:
home: https://rapids.ai/
license: BSD-3-Clause
license_family: BSD
license_file: ../../../LICENSE
summary: UCX Python interface built on top of the libucxx C++ implementation


- name: distributed-ucxx
version: {{ version }}
script: build_and_install_distributed_ucxx.sh
build:
number: {{ GIT_DESCRIBE_NUMBER }}
string: py{{ python }}_{{ date_string }}_{{ GIT_DESCRIBE_HASH }}_{{ GIT_DESCRIBE_NUMBER }}
requirements:
host:
- python
- pip
- tomli
run:
- python * *_cpython
- dask
- distributed
- {{ pin_subpackage('ucxx', exact=True) }}
test:
imports:
- distributed_ucxx
about:
home: https://rapids.ai/
license: BSD-3-Clause
license_family: BSD
license_file: ../../../LICENSE
summary: UCX communication module for Dask Distributed
28 changes: 28 additions & 0 deletions python/distributed-ucxx/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
BSD 3-Clause License

Copyright (c) 2019-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.

* Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
3 changes: 3 additions & 0 deletions python/distributed-ucxx/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# UCX Communication Module for Distributed

This is the UCX communication module for the Distributed framework. It is required to enable UCX communications.
2 changes: 2 additions & 0 deletions python/distributed-ucxx/distributed_ucxx/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
from .ucxx import UCXXBackend, UCXXConnector, UCXXListener # noqa: F401
from . import distributed_patches # noqa: F401
Loading

0 comments on commit 7af9f19

Please sign in to comment.