You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is this a new feature, an improvement, or a change to existing functionality?
Improvement
How would you describe the priority of this feature request
Critical (currently preventing usage)
Please provide a clear description of problem this feature solves
The current set of MG test use a dataset that is too small, it is even small for SG testing.
The datasets need to be over 200 million edges, 1 billion edges and larger preferred
Nothing smaller than the cyber.csv , Twitter dataset should be mandatory.
The issues is that we are not uncovering DASK and other issues when using trivial datasets. Testing MG on a dataset with 2K edges is pointless
Describe your ideal solution
Create a MG dataset that every MG test uses.
Update all test to use the new dataset
Describe any alternatives you have considered
No response
Additional context
No response
Code of Conduct
I agree to follow cuGraph's Code of Conduct
I have searched the open feature requests and have found no duplicates for this feature request
The text was updated successfully, but these errors were encountered:
closes#2810closes#3282
* Adds the ability to use datasets read from files on disk and/or RMAT-generated synthetic datasets.
* Adds markers for "file_data" and "rmat_data" for use by benchmark scripts, based on cluster size.
* Adds CLI options for specifying the RMAT scale and edgefactor in order to generate datasets large enough for MNMG runs.
* Adds fixtures for use by `bench_algos.py` benchmarks which will instantiate graph objs based on dataset type and SG or MG markers.
* Updated `Dataset` class to allow instances to be used as test params and properly provide human-readable/deterministic test IDs.
* Added ability for `Dataset` ctor to take a .csf file as input, useful when a metadata.yaml file for a dataset has not been created yet.
* Added options to `get_test_data.sh` in the CI scripts to download a subset of datasets for C++ (to save time/space since most datasets aren't needed), and to only download the benchmark data for python (for use when running benchmarks as tests).
Authors:
- Rick Ratzel (https://github.com/rlratzel)
- Alex Barghi (https://github.com/alexbarghi-nv)
Approvers:
- Alex Barghi (https://github.com/alexbarghi-nv)
- Vibhu Jawa (https://github.com/VibhuJawa)
- Ray Douglass (https://github.com/raydouglass)
URL: #3540
Is this a new feature, an improvement, or a change to existing functionality?
Improvement
How would you describe the priority of this feature request
Critical (currently preventing usage)
Please provide a clear description of problem this feature solves
The current set of MG test use a dataset that is too small, it is even small for SG testing.
The datasets need to be over 200 million edges, 1 billion edges and larger preferred
Nothing smaller than the cyber.csv , Twitter dataset should be mandatory.
The issues is that we are not uncovering DASK and other issues when using trivial datasets. Testing MG on a dataset with 2K edges is pointless
Describe your ideal solution
Create a MG dataset that every MG test uses.
Update all test to use the new dataset
Describe any alternatives you have considered
No response
Additional context
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: