Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA]: MG Testing should use datasets that requier more than 1 GPU #3282

Closed
2 tasks done
BradReesWork opened this issue Feb 14, 2023 · 1 comment · Fixed by #3540
Closed
2 tasks done

[FEA]: MG Testing should use datasets that requier more than 1 GPU #3282

BradReesWork opened this issue Feb 14, 2023 · 1 comment · Fixed by #3540
Assignees
Labels
feature request New feature or request

Comments

@BradReesWork
Copy link
Member

Is this a new feature, an improvement, or a change to existing functionality?

Improvement

How would you describe the priority of this feature request

Critical (currently preventing usage)

Please provide a clear description of problem this feature solves

The current set of MG test use a dataset that is too small, it is even small for SG testing.
The datasets need to be over 200 million edges, 1 billion edges and larger preferred

Nothing smaller than the cyber.csv , Twitter dataset should be mandatory.

The issues is that we are not uncovering DASK and other issues when using trivial datasets. Testing MG on a dataset with 2K edges is pointless

Describe your ideal solution

Create a MG dataset that every MG test uses.
Update all test to use the new dataset

Describe any alternatives you have considered

No response

Additional context

No response

Code of Conduct

  • I agree to follow cuGraph's Code of Conduct
  • I have searched the open feature requests and have found no duplicates for this feature request
@BradReesWork BradReesWork added feature request New feature or request ? - Needs Triage Need team to review and classify and removed ? - Needs Triage Need team to review and classify labels Feb 14, 2023
@rlratzel
Copy link
Contributor

rlratzel commented May 9, 2023

Related: #3540 adds MG benchmarks that use RMAT-generated datasets that scale with the number of GPUs.

rapids-bot bot pushed a commit that referenced this issue May 20, 2023
closes #2810 
closes #3282

* Adds the ability to use datasets read from files on disk and/or RMAT-generated synthetic datasets.
* Adds markers for "file_data" and "rmat_data" for use by benchmark scripts, based on cluster size.
* Adds CLI options for specifying the RMAT scale and edgefactor in order to generate datasets large enough for MNMG runs.
* Adds fixtures for use by `bench_algos.py` benchmarks which will instantiate graph objs based on dataset type and SG or MG markers.
* Updated `Dataset` class to allow instances to be used as test params and properly provide human-readable/deterministic test IDs.
* Added ability for `Dataset` ctor to take a .csf file as input, useful when a metadata.yaml file for a dataset has not been created yet.
* Added options to `get_test_data.sh` in the CI scripts to download a subset of datasets for C++ (to save time/space since most datasets aren't needed), and to only download the benchmark data for python (for use when running benchmarks as tests).

Authors:
  - Rick Ratzel (https://github.com/rlratzel)
  - Alex Barghi (https://github.com/alexbarghi-nv)

Approvers:
  - Alex Barghi (https://github.com/alexbarghi-nv)
  - Vibhu Jawa (https://github.com/VibhuJawa)
  - Ray Douglass (https://github.com/raydouglass)

URL: #3540
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants