-
Notifications
You must be signed in to change notification settings - Fork 913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Improve scaling of data generation in NDS-H-cpp benchmarks #16987
Comments
We could create managed memory for data generation use it and destroy after writing the parquet data to host. Use this result for queries. But remember, host to device transfer is included as part of scan (parquet read) in benchmark time as well. |
Thank you @karthikeyann for your comments.
In the end I would like to be able to run SF100 with CUDA async MR on A100. If the data gen uses managed MR and the timed queries use async MR, that would work great. |
Fixes #16987 Use managed memory to generate the parquet data, and write parquet data to host buffer. Replace use of parquet_device_buffer with cuio_source_sink_pair Authors: - Karthikeyan (https://github.com/karthikeyann) Approvers: - David Wendt (https://github.com/davidwendt) - Tianyu Liu (https://github.com/kingcrimsontianyu) - Muhammad Haseeb (https://github.com/mhaseeb123) URL: #17039
Is your feature request related to a problem? Please describe.
In the NDS-H-cpp benchmarks, the memory footprint of data generation is larger than the memory footprint of query execution. This ends up limiting us to <=SF10 on H100 GPUs. Perhaps as much as 10x smaller than we can go with pre-generated files.
Describe the solution you'd like
There are a few solutions we could use:
write_to_parquet_device_buffer
)Additional context
On A100, we can run query sizes up to SF100 or so, but the generator only goes to ~SF10.
The text was updated successfully, but these errors were encountered: