-
Notifications
You must be signed in to change notification settings - Fork 933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Update IO benchmarks for consistency between formats #12739
Labels
2 - In Progress
Currently a work in progress
cuIO
cuIO issue
feature request
New feature or request
good first issue
Good for newcomers
libcudf
Affects libcudf (C++/CUDA) code.
Milestone
Comments
3 tasks
rapids-bot bot
pushed a commit
that referenced
this issue
Mar 1, 2023
- Add JSON writer benchmark. This benchmark is modeled after CSV writer. - Add JSON reader benchmark with file data source ([NESTED_JSON](https://github.com/rapidsai/cudf/blob/branch-23.04/cpp/benchmarks/io/json/nested_json.cpp?rgh-link-date=2023-02-08T22%3A43%3A38Z) only does parsing and only on device buffers). This benchmark is modeled after BM_csv_read_io fixes part of #12739 Authors: - Karthikeyan (https://github.com/karthikeyann) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - David Wendt (https://github.com/davidwendt) URL: #12753
4 tasks
4 tasks
rapids-bot bot
pushed a commit
that referenced
this issue
Dec 11, 2023
Addresses issue: [#12739](#12739) This PR transforms compression and io into string axis types to enable the selection of different values via the CLI, eliminating the need to execute all values in an automation when required. Additionally, this PR introduces two new functions, `retrieve_io_type_enum` and `retrieve_compression_type_enum`, which facilitate the conversion of string input into the corresponding enum type that can be used in benchmarking functions. IO Benchmarks: - [x] PARQUET READER For example: `./PARQUET_READER_NVBENCH -b parquet_read_io_compression --axis io_type=[HOST_BUFFER] --axis compression_type=[NONE]` Authors: - Suraj Aralihalli (https://github.com/SurajAralihalli) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Nghia Truong (https://github.com/ttnghia) URL: #14347
karthikeyann
pushed a commit
to karthikeyann/cudf
that referenced
this issue
Dec 12, 2023
…dsai#14347) Addresses issue: [rapidsai#12739](rapidsai#12739) This PR transforms compression and io into string axis types to enable the selection of different values via the CLI, eliminating the need to execute all values in an automation when required. Additionally, this PR introduces two new functions, `retrieve_io_type_enum` and `retrieve_compression_type_enum`, which facilitate the conversion of string input into the corresponding enum type that can be used in benchmarking functions. IO Benchmarks: - [x] PARQUET READER For example: `./PARQUET_READER_NVBENCH -b parquet_read_io_compression --axis io_type=[HOST_BUFFER] --axis compression_type=[NONE]` Authors: - Suraj Aralihalli (https://github.com/SurajAralihalli) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Nghia Truong (https://github.com/ttnghia) URL: rapidsai#14347
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
2 - In Progress
Currently a work in progress
cuIO
cuIO issue
feature request
New feature or request
good first issue
Good for newcomers
libcudf
Affects libcudf (C++/CUDA) code.
Is your feature request related to a problem? Please describe.
orc_read_decode
,parquet_read_decode
andcsv_read_input
to use DEVICE_BUFFER data source #12678multibyte_split
#12675BM_csv_read_io
[FEA] Add a Parquet reader benchmark that uses multiple CUDA streams #12700[FEA] Expand range of file sizes for CSV reader benchmarks #12674 -> add compression toBM_csv_read_io
(?)compression
andio
to string axis type. see this discussion in nvbench. the goal is to choose other values from the CLI without having to run all values in automation.pmu
,efs
,trc
in ORC writer chunks topeak_memory_usage
,encoded_file_size
,total_rows
, to conform with the other ORC, PQ, CSV, text benchmarksAdditional context

The initial set of topics came from a comparison of file read throughput across the supported formats in cuIO.
We are also preparing for a comparison of memory footprint across cuIO, especially with Zstd compression/decompression.
The text was updated successfully, but these errors were encountered: