Updates pytest benchmarks to use synthetic data and multi-GPUs #3540

rlratzel · 2023-05-04T04:05:20Z

closes #2810
closes #3282

Adds the ability to use datasets read from files on disk and/or RMAT-generated synthetic datasets.
Adds markers for "file_data" and "rmat_data" for use by benchmark scripts, based on cluster size.
Adds CLI options for specifying the RMAT scale and edgefactor in order to generate datasets large enough for MNMG runs.
Adds fixtures for use by bench_algos.py benchmarks which will instantiate graph objs based on dataset type and SG or MG markers.
Updated Dataset class to allow instances to be used as test params and properly provide human-readable/deterministic test IDs.
Added ability for Dataset ctor to take a .csf file as input, useful when a metadata.yaml file for a dataset has not been created yet.
Added options to get_test_data.sh in the CI scripts to download a subset of datasets for C++ (to save time/space since most datasets aren't needed), and to only download the benchmark data for python (for use when running benchmarks as tests).

…in order to make readable test IDs when Dataset instances are used as test params.

… This will be used for tests requiring .csv datasets that have not been added to the datasets package yet.

…csv files for testing, added new markers for dataset types.

…6-mg_pytest_benchmarks

…set instances are present but only one is used at a time, WIP for updating fixtures.

…ready imported (may change that later), added setup/teardown to cleanup memory, changed unload test to simply look at internal vars for now and added FIXME.

…6-mg_pytest_benchmarks

… reuse the cluster.

…e fixtures for setup and teardown, have Dataset download method also update self._path to latest value, changes to bench_algos.py to support MG runs.

…legal for use in a test run as an option to pytest -k

…anspose option.

…6-mg_pytest_benchmarks

…ctions in cugraph.testing, adds a new env var for setting the dask local directory, updates test fixtures to use the updated dask utils.

alexbarghi-nv

👍

alexbarghi-nv · 2023-05-18T16:04:45Z

benchmarks/cugraph/pytest-based/bench_algos.py

+dataset_fixture_params = gen_fixture_params_product(
+    (directed_datasets +
+     undirected_datasets +
+     [rmat_sg_dataset, rmat_mg_dataset], "ds"))

 # Record the current RMM settings so reinitialize() will be called only when a
 # change is needed (RMM defaults both values to False). The --allow-rmm-reinit


Is this where we should be setting the cupy and torch allocators as well?

cupy is set to use rmm pool when we import cuDF.

We can decide to add pytorch based on wether we decide to run dgl/pyg benchmarks here.

Two questions:

Is this config for SG or for MG too ?

If only SG, it might be worth exploring setting pool_alloc=True to speed up CI (if that is a concern).

Thanks for looking into this, here's some background:

bench_algos.py was originally SG-only, and this PR adds MG coverage. The parameterization of managed mem and pool alloc adds (potentially a lot of) benchmark run time and we've been leaving the defaults in place for automated runs. The use case of actually changing the RMM config multiple times in the same process also did not seem to match a typical user's workflow and sometimes led to problems if we didn't clean up properly (as you pointed out elsewhere), so having pytest run through all combinations is only done using a special CLI option --allow-rmm-reinit (see here). Otherwise, if a user wants to change the default they just pick the marker they want to use. The current defaults are pool_alloc True, managed_mem False (see here).

To answer @VibhuJawa 's questions more directly:

it's for SG, I should add a FIXME to figure out how to apply these settings to the dask config when using MG. This is actually a big gap IMO which I'll address either in this PR or a follow-up.

pool_alloc defaults to True in conftest.py, but I should add a comment here mentioning that.

To answer @alexbarghi-nv 's question:

Is this where we should be setting the cupy and torch allocators as well?

I like @VibhuJawa 's suggestion ("We can decide to add pytorch based on wether we decide to run dgl/pyg benchmarks here.")

Thanks for the detailed answer. It makes sense to me.

VibhuJawa

Did a initial first glance review

VibhuJawa · 2023-05-18T17:11:57Z

benchmarks/cugraph/pytest-based/bench_algos.py

+dataset_fixture_params = gen_fixture_params_product(
+    (directed_datasets +
+     undirected_datasets +
+     [rmat_sg_dataset, rmat_mg_dataset], "ds"))

 # Record the current RMM settings so reinitialize() will be called only when a
 # change is needed (RMM defaults both values to False). The --allow-rmm-reinit


cupy is set to use rmm pool when we import cuDF.

We can decide to add pytorch based on wether we decide to run dgl/pyg benchmarks here.

VibhuJawa · 2023-05-18T17:15:02Z

benchmarks/cugraph/pytest-based/bench_algos.py

+dataset_fixture_params = gen_fixture_params_product(
+    (directed_datasets +
+     undirected_datasets +
+     [rmat_sg_dataset, rmat_mg_dataset], "ds"))

 # Record the current RMM settings so reinitialize() will be called only when a
 # change is needed (RMM defaults both values to False). The --allow-rmm-reinit


Two questions:

Is this config for SG or for MG too ?

If only SG, it might be worth exploring setting pool_alloc=True to speed up CI (if that is a concern).

VibhuJawa · 2023-05-18T17:15:58Z

benchmarks/cugraph/pytest-based/bench_algos.py

-    reinitRMM(request.param[1], request.param[2])
-    return utils.read_csv_file(csvFileName)
+    setFixtureParamNames(request, ["managed_mem", "pool_allocator"])
+    reinitRMM(request.param[0], request.param[1])


I think reinitializing RMM pool will fail if you have anything lying around in RMM memory.

Agreed. In fact, based on the observations noted here and the suggestion here, I'm wondering if I should change this from being marker-based (which implies it could run multiple combinations of RMM configurations within a single test suite run) to something set only once using a custom CLI option in conftest.py.

I'm going to add a FIXME to say that unless you object.

python/cugraph/cugraph/testing/mg_utils.py

…6-mg_pytest_benchmarks

…python benchmark updates, subset of datasets only needed for C++), changes DASK_NUM_WORKERS to DASK_WORKER_DEVICES, adds docstrings.

VibhuJawa

LGTM

…6-mg_pytest_benchmarks

…m/rlratzel/cugraph into branch-23.06-mg_pytest_benchmarks

rlratzel · 2023-05-20T02:56:12Z

/merge

) closes #3413 This PR enables MG python tests using a single-GPU `LocalCUDACluster` in CI by setting the `DASK_WORKER_DEVICES` to `0`. This does not affect SG tests, which continue to run in CI as they previously did. PR #3540 added support for `DASK_WORKER_DEVICES` to the pytest fixture used in python MG tests, allowing CI scripts (and developers) to restrict workers to specific devices, which should now allow single-GPU CI runs to cover the MG/dask code paths in python. _Note: despite the changes here to now run python MG tests, it should be noted that this isn't actual multi-GPU test coverage in `libcugraph` since multiple GPUs are not communicating and the code to set up comms, distribute graph data, etc. is not being exercised using >1 GPU._ Authors: - Rick Ratzel (https://github.com/rlratzel) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) - Vibhu Jawa (https://github.com/VibhuJawa) - Brad Rees (https://github.com/BradReesWork) URL: #3596

rlratzel added 4 commits May 3, 2023 20:06

Added new CLI options for RMAT datasets, added __str__() for Dataset …

aec3565

…in order to make readable test IDs when Dataset instances are used as test params.

Added ability to create a Dataset instance using a .csv file on disk.…

829740b

… This will be used for tests requiring .csv datasets that have not been added to the datasets package yet.

Added instances of Dataset objects not yet in metadata folder using .…

7b3d65e

…csv files for testing, added new markers for dataset types.

Merge remote-tracking branch 'upstream/branch-23.06' into branch-23.0…

3422db1

…6-mg_pytest_benchmarks

rlratzel added feature request New feature or request non-breaking Non-breaking change labels May 4, 2023

rlratzel self-assigned this May 4, 2023

rlratzel added 11 commits May 4, 2023 22:39

Added Dataset.unload() to help reduce memory usage when multiple Data…

af6d0be

…set instances are present but only one is used at a time, WIP for updating fixtures.

WIP: updated fixtures

4ec4b95

Removed fixture and isntead use imported module directly since its al…

c19ca0d

…ready imported (may change that later), added setup/teardown to cleanup memory, changed unload test to simply look at internal vars for now and added FIXME.

Added MG support to RmatDataset.

6839b72

Merge remote-tracking branch 'upstream/branch-23.06' into branch-23.0…

4dd8168

…6-mg_pytest_benchmarks

Updated markers for real (.csv file) dataset params, minor code cleanup.

17ad493

WIP: refactored fixtures to keep MG runs together to more efficiently…

347450c

… reuse the cluster.

Removed usued config feature from Dataset, update test_datasets to us…

98bb280

…e fixtures for setup and teardown, have Dataset download method also update self._path to latest value, changes to bench_algos.py to support MG runs.

Changed pytest utility to generate test IDs with characters that are …

dc7da78

…legal for use in a test run as an option to pytest -k

Added initial uniform_neighbor_sample benchmark.

cb6f7e6

Added egonet benchmark

de6c4e0

This was referenced May 9, 2023

cugraph.tests.test_dataset.test_load_all often fails with a timeout error in CI #2810

Closed

[FEA]: MG Testing should use datasets that requier more than 1 GPU #3282

Closed

rlratzel added 3 commits May 10, 2023 23:34

Changed marker names.

d1d1aa5

Removed unused calls to compute adj list, fixed marker name, added tr…

c194c3b

…anspose option.

Added comments, remove unused ETL marker.

4826db5

rlratzel marked this pull request as ready for review May 16, 2023 14:52

rlratzel requested a review from a team as a code owner May 16, 2023 14:52

rlratzel added 3 commits May 17, 2023 19:56

Merge remote-tracking branch 'upstream/branch-23.06' into branch-23.0…

138a8d9

…6-mg_pytest_benchmarks

Merge remote-tracking branch 'upstream/branch-23.06' into branch-23.0…

7d0e38c

…6-mg_pytest_benchmarks

Centralizes the dask client start/stop utils into a single set of fun…

b0bccf2

…ctions in cugraph.testing, adds a new env var for setting the dask local directory, updates test fixtures to use the updated dask utils.

alexbarghi-nv approved these changes May 18, 2023

View reviewed changes

VibhuJawa suggested changes May 18, 2023

View reviewed changes

rlratzel added 2 commits May 18, 2023 16:42

Merge remote-tracking branch 'upstream/branch-23.06' into branch-23.0…

2c3ffa9

…6-mg_pytest_benchmarks

Adds specific datasets to download from CI scripts (new datasets for …

f4433f4

…python benchmark updates, subset of datasets only needed for C++), changes DASK_NUM_WORKERS to DASK_WORKER_DEVICES, adds docstrings.

rlratzel requested a review from a team as a code owner May 18, 2023 21:45

VibhuJawa approved these changes May 18, 2023

View reviewed changes

Adds FIXMEs and a docstring.

cb0a502

rlratzel mentioned this pull request May 19, 2023

Nightly MNMG test runs need to provide dask logs from single-node tests too rapidsai/multi-gpu-tools#23

Closed

raydouglass approved these changes May 19, 2023

View reviewed changes

alexbarghi-nv and others added 3 commits May 19, 2023 10:17

Merge branch 'branch-23.06' into branch-23.06-mg_pytest_benchmarks

c35868a

Merge remote-tracking branch 'upstream/branch-23.06' into branch-23.0…

e015581

…6-mg_pytest_benchmarks

Merge branch 'branch-23.06-mg_pytest_benchmarks' of https://github.co…

ce2c109

…m/rlratzel/cugraph into branch-23.06-mg_pytest_benchmarks

rapids-bot bot merged commit 272e316 into rapidsai:branch-23.06 May 20, 2023

This was referenced May 20, 2023

Add 500+ GPU benchmark for sampling #3125

Closed

Enables MG python tests using a single-GPU LocalCUDACluster in CI #3596

Merged

rlratzel deleted the branch-23.06-mg_pytest_benchmarks branch September 28, 2023 20:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updates pytest benchmarks to use synthetic data and multi-GPUs #3540

Updates pytest benchmarks to use synthetic data and multi-GPUs #3540

rlratzel commented May 4, 2023 •

edited

Loading

alexbarghi-nv left a comment

alexbarghi-nv May 18, 2023

VibhuJawa May 18, 2023

VibhuJawa May 18, 2023

rlratzel May 18, 2023

VibhuJawa May 18, 2023

VibhuJawa left a comment

VibhuJawa May 18, 2023

VibhuJawa May 18, 2023

VibhuJawa May 18, 2023

rlratzel May 18, 2023

VibhuJawa left a comment

rlratzel commented May 20, 2023

Updates pytest benchmarks to use synthetic data and multi-GPUs #3540

Updates pytest benchmarks to use synthetic data and multi-GPUs #3540

Conversation

rlratzel commented May 4, 2023 • edited Loading

alexbarghi-nv left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

VibhuJawa left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

VibhuJawa left a comment

Choose a reason for hiding this comment

rlratzel commented May 20, 2023

rlratzel commented May 4, 2023 •

edited

Loading