-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updates pytest benchmarks to use synthetic data and multi-GPUs #3540
Updates pytest benchmarks to use synthetic data and multi-GPUs #3540
Conversation
…in order to make readable test IDs when Dataset instances are used as test params.
… This will be used for tests requiring .csv datasets that have not been added to the datasets package yet.
…csv files for testing, added new markers for dataset types.
…6-mg_pytest_benchmarks
…set instances are present but only one is used at a time, WIP for updating fixtures.
…ready imported (may change that later), added setup/teardown to cleanup memory, changed unload test to simply look at internal vars for now and added FIXME.
…6-mg_pytest_benchmarks
… reuse the cluster.
…e fixtures for setup and teardown, have Dataset download method also update self._path to latest value, changes to bench_algos.py to support MG runs.
…legal for use in a test run as an option to pytest -k
…6-mg_pytest_benchmarks
…6-mg_pytest_benchmarks
…ctions in cugraph.testing, adds a new env var for setting the dask local directory, updates test fixtures to use the updated dask utils.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
dataset_fixture_params = gen_fixture_params_product( | ||
(directed_datasets + | ||
undirected_datasets + | ||
[rmat_sg_dataset, rmat_mg_dataset], "ds")) | ||
|
||
# Record the current RMM settings so reinitialize() will be called only when a | ||
# change is needed (RMM defaults both values to False). The --allow-rmm-reinit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this where we should be setting the cupy
and torch
allocators as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cupy
is set to use rmm pool when we import cuDF
.
We can decide to add pytorch
based on wether we decide to run dgl/pyg
benchmarks here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two questions:
- Is this config for SG or for MG too ?
- If only SG, it might be worth exploring setting
pool_alloc=True
to speed up CI (if that is a concern).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into this, here's some background:
bench_algos.py
was originally SG-only, and this PR adds MG coverage. The parameterization of managed mem and pool alloc adds (potentially a lot of) benchmark run time and we've been leaving the defaults in place for automated runs. The use case of actually changing the RMM config multiple times in the same process also did not seem to match a typical user's workflow and sometimes led to problems if we didn't clean up properly (as you pointed out elsewhere), so having pytest run through all combinations is only done using a special CLI option --allow-rmm-reinit
(see here). Otherwise, if a user wants to change the default they just pick the marker they want to use. The current defaults are pool_alloc True
, managed_mem False
(see here).
To answer @VibhuJawa 's questions more directly:
- it's for SG, I should add a FIXME to figure out how to apply these settings to the dask config when using MG. This is actually a big gap IMO which I'll address either in this PR or a follow-up.
- pool_alloc defaults to True in conftest.py, but I should add a comment here mentioning that.
To answer @alexbarghi-nv 's question:
Is this where we should be setting the cupy and torch allocators as well?
I like @VibhuJawa 's suggestion ("We can decide to add pytorch based on wether we decide to run dgl/pyg benchmarks here.")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed answer. It makes sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a initial first glance review
dataset_fixture_params = gen_fixture_params_product( | ||
(directed_datasets + | ||
undirected_datasets + | ||
[rmat_sg_dataset, rmat_mg_dataset], "ds")) | ||
|
||
# Record the current RMM settings so reinitialize() will be called only when a | ||
# change is needed (RMM defaults both values to False). The --allow-rmm-reinit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cupy
is set to use rmm pool when we import cuDF
.
We can decide to add pytorch
based on wether we decide to run dgl/pyg
benchmarks here.
dataset_fixture_params = gen_fixture_params_product( | ||
(directed_datasets + | ||
undirected_datasets + | ||
[rmat_sg_dataset, rmat_mg_dataset], "ds")) | ||
|
||
# Record the current RMM settings so reinitialize() will be called only when a | ||
# change is needed (RMM defaults both values to False). The --allow-rmm-reinit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two questions:
- Is this config for SG or for MG too ?
- If only SG, it might be worth exploring setting
pool_alloc=True
to speed up CI (if that is a concern).
reinitRMM(request.param[1], request.param[2]) | ||
return utils.read_csv_file(csvFileName) | ||
setFixtureParamNames(request, ["managed_mem", "pool_allocator"]) | ||
reinitRMM(request.param[0], request.param[1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think reinitializing
RMM pool will fail if you have anything lying around in RMM memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. In fact, based on the observations noted here and the suggestion here, I'm wondering if I should change this from being marker-based (which implies it could run multiple combinations of RMM configurations within a single test suite run) to something set only once using a custom CLI option in conftest.py
.
I'm going to add a FIXME to say that unless you object.
…6-mg_pytest_benchmarks
…python benchmark updates, subset of datasets only needed for C++), changes DASK_NUM_WORKERS to DASK_WORKER_DEVICES, adds docstrings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…6-mg_pytest_benchmarks
…m/rlratzel/cugraph into branch-23.06-mg_pytest_benchmarks
/merge |
) closes #3413 This PR enables MG python tests using a single-GPU `LocalCUDACluster` in CI by setting the `DASK_WORKER_DEVICES` to `0`. This does not affect SG tests, which continue to run in CI as they previously did. PR #3540 added support for `DASK_WORKER_DEVICES` to the pytest fixture used in python MG tests, allowing CI scripts (and developers) to restrict workers to specific devices, which should now allow single-GPU CI runs to cover the MG/dask code paths in python. _Note: despite the changes here to now run python MG tests, it should be noted that this isn't actual multi-GPU test coverage in `libcugraph` since multiple GPUs are not communicating and the code to set up comms, distribute graph data, etc. is not being exercised using >1 GPU._ Authors: - Rick Ratzel (https://github.com/rlratzel) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) - Vibhu Jawa (https://github.com/VibhuJawa) - Brad Rees (https://github.com/BradReesWork) URL: #3596
closes #2810
closes #3282
bench_algos.py
benchmarks which will instantiate graph objs based on dataset type and SG or MG markers.Dataset
class to allow instances to be used as test params and properly provide human-readable/deterministic test IDs.Dataset
ctor to take a .csf file as input, useful when a metadata.yaml file for a dataset has not been created yet.get_test_data.sh
in the CI scripts to download a subset of datasets for C++ (to save time/space since most datasets aren't needed), and to only download the benchmark data for python (for use when running benchmarks as tests).