[FEA]: MG Testing should use datasets that requier more than 1 GPU #3282

BradReesWork · 2023-02-14T21:34:56Z

Is this a new feature, an improvement, or a change to existing functionality?

Improvement

How would you describe the priority of this feature request

Critical (currently preventing usage)

Please provide a clear description of problem this feature solves

The current set of MG test use a dataset that is too small, it is even small for SG testing.
The datasets need to be over 200 million edges, 1 billion edges and larger preferred

Nothing smaller than the cyber.csv , Twitter dataset should be mandatory.

The issues is that we are not uncovering DASK and other issues when using trivial datasets. Testing MG on a dataset with 2K edges is pointless

Describe your ideal solution

Create a MG dataset that every MG test uses.
Update all test to use the new dataset

Describe any alternatives you have considered

No response

Additional context

No response

Code of Conduct

I agree to follow cuGraph's Code of Conduct
I have searched the open feature requests and have found no duplicates for this feature request

rlratzel · 2023-05-09T17:57:52Z

Related: #3540 adds MG benchmarks that use RMAT-generated datasets that scale with the number of GPUs.

closes #2810 closes #3282 * Adds the ability to use datasets read from files on disk and/or RMAT-generated synthetic datasets. * Adds markers for "file_data" and "rmat_data" for use by benchmark scripts, based on cluster size. * Adds CLI options for specifying the RMAT scale and edgefactor in order to generate datasets large enough for MNMG runs. * Adds fixtures for use by `bench_algos.py` benchmarks which will instantiate graph objs based on dataset type and SG or MG markers. * Updated `Dataset` class to allow instances to be used as test params and properly provide human-readable/deterministic test IDs. * Added ability for `Dataset` ctor to take a .csf file as input, useful when a metadata.yaml file for a dataset has not been created yet. * Added options to `get_test_data.sh` in the CI scripts to download a subset of datasets for C++ (to save time/space since most datasets aren't needed), and to only download the benchmark data for python (for use when running benchmarks as tests). Authors: - Rick Ratzel (https://github.com/rlratzel) - Alex Barghi (https://github.com/alexbarghi-nv) Approvers: - Alex Barghi (https://github.com/alexbarghi-nv) - Vibhu Jawa (https://github.com/VibhuJawa) - Ray Douglass (https://github.com/raydouglass) URL: #3540

BradReesWork added feature request New feature or request ? - Needs Triage Need team to review and classify and removed ? - Needs Triage Need team to review and classify labels Feb 14, 2023

BradReesWork assigned rlratzel Feb 14, 2023

rlratzel mentioned this issue May 19, 2023

Updates pytest benchmarks to use synthetic data and multi-GPUs #3540

Merged

rapids-bot bot closed this as completed in #3540 May 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA]: MG Testing should use datasets that requier more than 1 GPU #3282

[FEA]: MG Testing should use datasets that requier more than 1 GPU #3282

BradReesWork commented Feb 14, 2023

rlratzel commented May 9, 2023

[FEA]: MG Testing should use datasets that requier more than 1 GPU #3282

[FEA]: MG Testing should use datasets that requier more than 1 GPU #3282

Comments

BradReesWork commented Feb 14, 2023

Is this a new feature, an improvement, or a change to existing functionality?

How would you describe the priority of this feature request

Please provide a clear description of problem this feature solves

Describe your ideal solution

Describe any alternatives you have considered

Additional context

Code of Conduct

rlratzel commented May 9, 2023