Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates pytest benchmarks to use synthetic data and multi-GPUs #3540

Merged

Conversation

rlratzel
Copy link
Contributor

@rlratzel rlratzel commented May 4, 2023

closes #2810
closes #3282

  • Adds the ability to use datasets read from files on disk and/or RMAT-generated synthetic datasets.
  • Adds markers for "file_data" and "rmat_data" for use by benchmark scripts, based on cluster size.
  • Adds CLI options for specifying the RMAT scale and edgefactor in order to generate datasets large enough for MNMG runs.
  • Adds fixtures for use by bench_algos.py benchmarks which will instantiate graph objs based on dataset type and SG or MG markers.
  • Updated Dataset class to allow instances to be used as test params and properly provide human-readable/deterministic test IDs.
  • Added ability for Dataset ctor to take a .csf file as input, useful when a metadata.yaml file for a dataset has not been created yet.
  • Added options to get_test_data.sh in the CI scripts to download a subset of datasets for C++ (to save time/space since most datasets aren't needed), and to only download the benchmark data for python (for use when running benchmarks as tests).

rlratzel added 4 commits May 3, 2023 20:06
…in order to make readable test IDs when Dataset instances are used as test params.
… This will be used for tests requiring .csv datasets that have not been added to the datasets package yet.
…csv files for testing, added new markers for dataset types.
@rlratzel rlratzel added feature request New feature or request non-breaking Non-breaking change labels May 4, 2023
@rlratzel rlratzel self-assigned this May 4, 2023
rlratzel added 11 commits May 4, 2023 22:39
…set instances are present but only one is used at a time, WIP for updating fixtures.
…ready imported (may change that later), added setup/teardown to cleanup memory, changed unload test to simply look at internal vars for now and added FIXME.
…e fixtures for setup and teardown, have Dataset download method also update self._path to latest value, changes to bench_algos.py to support MG runs.
…legal for use in a test run as an option to pytest -k
@rlratzel rlratzel marked this pull request as ready for review May 16, 2023 14:52
@rlratzel rlratzel requested a review from a team as a code owner May 16, 2023 14:52
rlratzel added 3 commits May 17, 2023 19:56
…ctions in cugraph.testing, adds a new env var for setting the dask local directory, updates test fixtures to use the updated dask utils.
Copy link
Member

@alexbarghi-nv alexbarghi-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

dataset_fixture_params = gen_fixture_params_product(
(directed_datasets +
undirected_datasets +
[rmat_sg_dataset, rmat_mg_dataset], "ds"))

# Record the current RMM settings so reinitialize() will be called only when a
# change is needed (RMM defaults both values to False). The --allow-rmm-reinit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this where we should be setting the cupy and torch allocators as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cupy is set to use rmm pool when we import cuDF.

We can decide to add pytorch based on wether we decide to run dgl/pyg benchmarks here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two questions:

  1. Is this config for SG or for MG too ?
  2. If only SG, it might be worth exploring setting pool_alloc=True to speed up CI (if that is a concern).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this, here's some background:

bench_algos.py was originally SG-only, and this PR adds MG coverage. The parameterization of managed mem and pool alloc adds (potentially a lot of) benchmark run time and we've been leaving the defaults in place for automated runs. The use case of actually changing the RMM config multiple times in the same process also did not seem to match a typical user's workflow and sometimes led to problems if we didn't clean up properly (as you pointed out elsewhere), so having pytest run through all combinations is only done using a special CLI option --allow-rmm-reinit (see here). Otherwise, if a user wants to change the default they just pick the marker they want to use. The current defaults are pool_alloc True, managed_mem False (see here).

To answer @VibhuJawa 's questions more directly:

  1. it's for SG, I should add a FIXME to figure out how to apply these settings to the dask config when using MG. This is actually a big gap IMO which I'll address either in this PR or a follow-up.
  2. pool_alloc defaults to True in conftest.py, but I should add a comment here mentioning that.

To answer @alexbarghi-nv 's question:

Is this where we should be setting the cupy and torch allocators as well?

I like @VibhuJawa 's suggestion ("We can decide to add pytorch based on wether we decide to run dgl/pyg benchmarks here.")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed answer. It makes sense to me.

Copy link
Member

@VibhuJawa VibhuJawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a initial first glance review

dataset_fixture_params = gen_fixture_params_product(
(directed_datasets +
undirected_datasets +
[rmat_sg_dataset, rmat_mg_dataset], "ds"))

# Record the current RMM settings so reinitialize() will be called only when a
# change is needed (RMM defaults both values to False). The --allow-rmm-reinit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cupy is set to use rmm pool when we import cuDF.

We can decide to add pytorch based on wether we decide to run dgl/pyg benchmarks here.

dataset_fixture_params = gen_fixture_params_product(
(directed_datasets +
undirected_datasets +
[rmat_sg_dataset, rmat_mg_dataset], "ds"))

# Record the current RMM settings so reinitialize() will be called only when a
# change is needed (RMM defaults both values to False). The --allow-rmm-reinit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two questions:

  1. Is this config for SG or for MG too ?
  2. If only SG, it might be worth exploring setting pool_alloc=True to speed up CI (if that is a concern).

reinitRMM(request.param[1], request.param[2])
return utils.read_csv_file(csvFileName)
setFixtureParamNames(request, ["managed_mem", "pool_allocator"])
reinitRMM(request.param[0], request.param[1])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think reinitializing RMM pool will fail if you have anything lying around in RMM memory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. In fact, based on the observations noted here and the suggestion here, I'm wondering if I should change this from being marker-based (which implies it could run multiple combinations of RMM configurations within a single test suite run) to something set only once using a custom CLI option in conftest.py.

I'm going to add a FIXME to say that unless you object.

python/cugraph/cugraph/testing/mg_utils.py Outdated Show resolved Hide resolved
rlratzel added 2 commits May 18, 2023 16:42
…python benchmark updates, subset of datasets only needed for C++), changes DASK_NUM_WORKERS to DASK_WORKER_DEVICES, adds docstrings.
@rlratzel rlratzel requested a review from a team as a code owner May 18, 2023 21:45
Copy link
Member

@VibhuJawa VibhuJawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rlratzel
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 272e316 into rapidsai:branch-23.06 May 20, 2023
rapids-bot bot pushed a commit that referenced this pull request May 27, 2023
)

closes #3413 

This PR enables MG python tests using a single-GPU `LocalCUDACluster` in CI by setting the `DASK_WORKER_DEVICES` to `0`.  This does not affect SG tests, which continue to run in CI as they previously did.

PR #3540 added support for `DASK_WORKER_DEVICES` to the pytest fixture used in python MG tests, allowing CI scripts (and developers) to restrict workers to specific devices, which should now allow single-GPU CI runs to cover the MG/dask code paths in python.

_Note: despite the changes here to now run python MG tests, it should be noted that this isn't actual multi-GPU test coverage in `libcugraph` since multiple GPUs are not communicating and the code to set up comms, distribute graph data, etc. is not being exercised using >1 GPU._

Authors:
  - Rick Ratzel (https://github.com/rlratzel)

Approvers:
  - AJ Schmidt (https://github.com/ajschmidt8)
  - Vibhu Jawa (https://github.com/VibhuJawa)
  - Brad Rees (https://github.com/BradReesWork)

URL: #3596
@rlratzel rlratzel deleted the branch-23.06-mg_pytest_benchmarks branch September 28, 2023 20:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request non-breaking Non-breaking change
Projects
None yet
4 participants