Skip to content

Commit

Permalink
Adds API option to uniform_neighbor_sample() and UCX-Py infrastruct…
Browse files Browse the repository at this point in the history
…ure to allow for a client-side device to directly receive results (#2715)

closes rapidsai/GaaS#23

### Summary
* Adds an API option to `uniform_neighbor_sample()` to allow clients to specify a client-side GPU (using device number) which will receive the results, eliminating the expensive server-side device-host copy in order to serialize and send via RPC.
* Adds test(s) and benchmarks to cover the new feature
* Also adds a `--pydevelop` option to `build.sh` for building python components that allow for making edits without re-installing (runs `setup.py` with the `develop` option)

### Details
* This PR depends on #2684 

### Benchmarks
New benchmarks have been added by this PR, with the results shown below. The results to take note of are in the red rectangles. `device=None` is the previous version using serialization and transfer from the server to the client using host memory (using Apache Thrift), and `device=0` is a GPU-GPU transfer using UCX-Py from the server on GPU1 to the client on GPU0.  A typical data transfer for many GNN use cases involves ~1M values, which is what the highlighted benchmark simulates.
For smaller transfers, the Thrift-based RPC is faster, but increases significantly as the payload increases (so much so that the largest size that could be run in a reasonable time for Thrift was 1e6, vs UCX-Py sizes of 2e9 still being much faster). UCX-Py seems to have some amount of fixed overhead, but the benefits are dramatic at larger sizes.

![image](https://user-images.githubusercontent.com/3039903/197084418-7a1b638f-c963-4aca-a091-43138bc33f80.png)

Authors:
  - Rick Ratzel (https://github.com/rlratzel)

Approvers:
  - Joseph Nke (https://github.com/jnke2016)
  - Alex Barghi (https://github.com/alexbarghi-nv)
  - Brad Rees (https://github.com/BradReesWork)

URL: #2715
  • Loading branch information
rlratzel authored Oct 21, 2022
1 parent 50ba399 commit 30a59ea
Show file tree
Hide file tree
Showing 19 changed files with 1,017 additions and 381 deletions.
11 changes: 8 additions & 3 deletions build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ REPODIR=$(cd $(dirname $0); pwd)
LIBCUGRAPH_BUILD_DIR=${LIBCUGRAPH_BUILD_DIR:=${REPODIR}/cpp/build}
LIBCUGRAPH_ETL_BUILD_DIR=${LIBCUGRAPH_ETL_BUILD_DIR:=${REPODIR}/cpp/libcugraph_etl/build}

VALIDARGS="clean uninstall uninstall_cmake_deps libcugraph libcugraph_etl cugraph pylibcugraph cpp-mgtests docs -v -g -n --allgpuarch --skip_cpp_tests --cmake_default_generator -h --help"
VALIDARGS="clean uninstall uninstall_cmake_deps libcugraph libcugraph_etl cugraph pylibcugraph cpp-mgtests docs -v -g -n --pydevelop --allgpuarch --skip_cpp_tests --cmake_default_generator -h --help"
HELP="$0 [<target> ...] [<flag> ...]
where <target> is:
clean - remove all existing build artifacts and configuration (start over)
Expand All @@ -36,6 +36,7 @@ HELP="$0 [<target> ...] [<flag> ...]
-v - verbose build mode
-g - build for debug
-n - do not install after a successful build
--pydevelop - use setup.py develop instead of install
--allgpuarch - build for all supported GPU architectures
--skip_cpp_tests - do not build the SG test binaries as part of the libcugraph and libcugraph_etl targets
--cmake_default_generator - use the default cmake generator instead of ninja
Expand All @@ -59,6 +60,7 @@ BUILD_CPP_TESTS=ON
BUILD_CPP_MG_TESTS=OFF
BUILD_ALL_GPU_ARCH=0
CMAKE_GENERATOR_OPTION="-G Ninja"
PYTHON_INSTALL="install"

# Set defaults for vars that may not have been defined externally
# FIXME: if PREFIX is not set, check CONDA_PREFIX, but there is no fallback
Expand Down Expand Up @@ -113,6 +115,9 @@ fi
if hasArg --cmake_default_generator; then
CMAKE_GENERATOR_OPTION=""
fi
if hasArg --pydevelop; then
PYTHON_INSTALL="develop"
fi

# If clean or uninstall targets given, run them prior to any other steps
if hasArg uninstall; then
Expand Down Expand Up @@ -242,7 +247,7 @@ if buildAll || hasArg pylibcugraph; then
python setup.py build_ext --inplace -- -DFIND_CUGRAPH_CPP=ON \
-Dcugraph_ROOT=${LIBCUGRAPH_BUILD_DIR} -- -j${PARALLEL_LEVEL:-1}
if [[ ${INSTALL_TARGET} != "" ]]; then
env CUGRAPH_BUILD_PATH=${CUGRAPH_BUILD_PATH} python setup.py install
env CUGRAPH_BUILD_PATH=${CUGRAPH_BUILD_PATH} python setup.py ${PYTHON_INSTALL}
fi
fi

Expand All @@ -256,7 +261,7 @@ if buildAll || hasArg cugraph; then
python setup.py build_ext --inplace -- -DFIND_CUGRAPH_CPP=ON \
-Dcugraph_ROOT=${LIBCUGRAPH_BUILD_DIR} -- -j${PARALLEL_LEVEL:-1}
if [[ ${INSTALL_TARGET} != "" ]]; then
env CUGRAPH_BUILD_PATH=${CUGRAPH_BUILD_PATH} python setup.py install
env CUGRAPH_BUILD_PATH=${CUGRAPH_BUILD_PATH} python setup.py ${PYTHON_INSTALL}
fi
fi

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,14 +11,16 @@
# See the License for the specific language governing permissions and
# limitations under the License.
import gc
import random

import pytest
import cudf
import dask_cudf

import cugraph.dask as dcg
import cugraph
import dask_cudf
import cudf
from cugraph.testing import utils
from cugraph.dask import uniform_neighbor_sample
import random


# =============================================================================
Expand Down Expand Up @@ -106,6 +108,9 @@ def input_combo(request):
return parameters


# =============================================================================
# Tests
# =============================================================================
def test_mg_uniform_neighbor_sample_simple(dask_client, input_combo):

dg = input_combo["MGGraph"]
Expand Down
58 changes: 40 additions & 18 deletions python/cugraph/cugraph/tests/test_uniform_neighbor_sample.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,15 @@
# See the License for the specific language governing permissions and
# limitations under the License.
import gc
import random

import pytest
import cugraph
import cudf

import cugraph
from cugraph.testing import utils
from cugraph import uniform_neighbor_sample
from cugraph.experimental.datasets import DATASETS_UNDIRECTED, email_Eu_core, small_tree
import random


# =============================================================================
Expand Down Expand Up @@ -97,6 +99,34 @@ def input_combo(request):
return parameters


@pytest.fixture(scope="module")
def simple_unweighted_input_expected_output(request):
"""
Fixture for providing the input for a uniform_neighbor_sample test using a
small/simple unweighted graph and the corresponding expected output.
"""
test_data = {}

df = cudf.DataFrame(
{"src": [0, 1, 2, 2, 0, 1, 4, 4], "dst": [3, 2, 1, 4, 1, 3, 1, 2]}
)

G = cugraph.Graph()
G.from_cudf_edgelist(df, source="src", destination="dst")
test_data["Graph"] = G
test_data["start_list"] = cudf.Series([0], dtype="int32")
test_data["fanout_vals"] = [-1]
test_data["with_replacement"] = True

test_data["expected_src"] = [0, 0]
test_data["expected_dst"] = [3, 1]

return test_data


# =============================================================================
# Tests
# =============================================================================
def test_uniform_neighbor_sample_simple(input_combo):

G = input_combo["Graph"]
Expand Down Expand Up @@ -242,28 +272,20 @@ def test_uniform_neighbor_sample_tree(directed):
assert set(start_list.to_pandas()).issubset(set(result_nbr_vertices.to_pandas()))


def test_uniform_neighbor_sample_unweighted():
df = cudf.DataFrame(
{"src": [0, 1, 2, 2, 0, 1, 4, 4], "dst": [3, 2, 1, 4, 1, 3, 1, 2]}
)

G = cugraph.Graph()
G.from_cudf_edgelist(df, source="src", destination="dst")

start_list = cudf.Series([0], dtype="int32")
fanout_vals = [-1]
with_replacement = True
def test_uniform_neighbor_sample_unweighted(simple_unweighted_input_expected_output):
test_data = simple_unweighted_input_expected_output

sampling_results = uniform_neighbor_sample(
G, start_list, fanout_vals, with_replacement
test_data["Graph"],
test_data["start_list"],
test_data["fanout_vals"],
test_data["with_replacement"],
)

expected_src = [0, 0]
actual_src = sampling_results.sources
actual_src = actual_src.to_arrow().to_pylist()
assert sorted(actual_src) == sorted(expected_src)
assert sorted(actual_src) == sorted(test_data["expected_src"])

expected_dst = [3, 1]
actual_dst = sampling_results.destinations
actual_dst = actual_dst.to_arrow().to_pylist()
assert sorted(actual_dst) == sorted(expected_dst)
assert sorted(actual_dst) == sorted(test_data["expected_dst"])
58 changes: 52 additions & 6 deletions python/cugraph_service/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,18 +13,26 @@

TBD

### Example
Starting the server
```
$> PYTHONPATH=./python/cugraph_service python -m cugraph_service_server.server
```

## Client
(description)
### Installing the `cugraph_service_client` conda package

TBD

### Example
Starting a server for single-GPU-only cuGraph, using server extensions in `/my/cugraph_service/extensions`:
```
$> export PYTHONPATH=/rapids/cugraph/python/cugraph_service
$> python -m cugraph_service_server.server --graph-creation-extension-dir=/my/cugraph_service/extensions
```

Starting a server for multi-GPU cuGraph, same extensions:
```
$> export SCHEDULER_FILE=/tmp/scheduler.json
$> /rapids/cugraph/python/cugraph_service/scripts/run-dask-process.sh scheduler workers &
$> python -m cugraph_service_server.server --graph-creation-extension-dir=/my/cugraph_service/extensions --dask-scheduler-file=$SCHEDULER_FILE
```

### Example
Creating a client
```
Expand All @@ -33,6 +41,44 @@ Creating a client
>>> client.load_csv_as_vertex_data(...)
```

### Debugging
#### UCX-Py related variables:
`UCX_TLS` - set the transports to use, in priority order. Example:
```
UCX_TLS=tcp,cuda_copy,cuda_ipc
```
`UCX_TCP_CM_REUSEADDR` - reuse addresses. This can be used to avoid "resource in use" errors during starting/restarting the service repeatedly.
```
UCX_TCP_CM_REUSEADDR=y
```
`UCX_LOG_LEVEL` - set the level for which UCX will output messages to the console. The example below will only output "ERROR" or higher. Set to "DEBUG" to see debug and higher messages.
```
UCX_LOG_LEVEL=ERROR
```

#### UCX performance checks:
Because cugraph-service uses UCX-Py for direct-to-client GPU data transfers when specified, it can be helpful to understand the various UCX performance chacks available to ensure cugraph-service is transfering results as efficiently as the system is capable of.
```
ucx_perftest -m cuda -t tag_bw -n 100 -s 16000 &
ucx_perftest -m cuda -t tag_bw -n 100 -s 16000 localhost
```
```
ucx_perftest -m cuda -t tag_bw -n 100 -s 1000000000 &
ucx_perftest -m cuda -t tag_bw -n 100 -s 1000000000 localhost
```
```
CUDA_VISIBLE_DEVICES=0,1 ucx_perftest -m cuda -t tag_bw -n 100 -s 16000 &
CUDA_VISIBLE_DEVICES=0,1 ucx_perftest -m cuda -t tag_bw -n 100 -s 16000 localhost
```
```
CUDA_VISIBLE_DEVICES=0,1 ucx_perftest -m cuda -t tag_bw -n 100 -s 1000000000 &
CUDA_VISIBLE_DEVICES=0,1 ucx_perftest -m cuda -t tag_bw -n 100 -s 1000000000 localhost
```
```
CUDA_VISIBLE_DEVICES=0,1 ucx_perftest -m cuda -t tag_bw -n 1000000 -s 1000000000 &
CUDA_VISIBLE_DEVICES=0,1 ucx_perftest -m cuda -t tag_bw -n 1000000 -s 1000000000 localhost
```

------

## <div align="left"><img src="img/rapids_logo.png" width="265px"/></div> Open GPU Data Science
Expand Down
Loading

0 comments on commit 30a59ea

Please sign in to comment.