[FEA] Update IO benchmarks for consistency between formats #12739

GregoryKimball · 2023-02-08T22:43:38Z

- Add JSON writer benchmark. This benchmark is modeled after CSV writer. - Add JSON reader benchmark with file data source ([NESTED_JSON](https://github.com/rapidsai/cudf/blob/branch-23.04/cpp/benchmarks/io/json/nested_json.cpp?rgh-link-date=2023-02-08T22%3A43%3A38Z) only does parsing and only on device buffers). This benchmark is modeled after BM_csv_read_io fixes part of #12739 Authors: - Karthikeyan (https://github.com/karthikeyann) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - David Wendt (https://github.com/davidwendt) URL: #12753

Addresses issue: [#12739](#12739) This PR transforms compression and io into string axis types to enable the selection of different values via the CLI, eliminating the need to execute all values in an automation when required. Additionally, this PR introduces two new functions, `retrieve_io_type_enum` and `retrieve_compression_type_enum`, which facilitate the conversion of string input into the corresponding enum type that can be used in benchmarking functions. IO Benchmarks: - [x] PARQUET READER For example: `./PARQUET_READER_NVBENCH -b parquet_read_io_compression --axis io_type=[HOST_BUFFER] --axis compression_type=[NONE]` Authors: - Suraj Aralihalli (https://github.com/SurajAralihalli) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Nghia Truong (https://github.com/ttnghia) URL: #14347

…dsai#14347) Addresses issue: [rapidsai#12739](rapidsai#12739) This PR transforms compression and io into string axis types to enable the selection of different values via the CLI, eliminating the need to execute all values in an automation when required. Additionally, this PR introduces two new functions, `retrieve_io_type_enum` and `retrieve_compression_type_enum`, which facilitate the conversion of string input into the corresponding enum type that can be used in benchmarking functions. IO Benchmarks: - [x] PARQUET READER For example: `./PARQUET_READER_NVBENCH -b parquet_read_io_compression --axis io_type=[HOST_BUFFER] --axis compression_type=[NONE]` Authors: - Suraj Aralihalli (https://github.com/SurajAralihalli) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Nghia Truong (https://github.com/ttnghia) URL: rapidsai#14347

GregoryKimball added feature request New feature or request Needs Triage Need team to review and classify labels Feb 8, 2023

GregoryKimball added this to libcudf Feb 8, 2023

GregoryKimball added 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. cuIO cuIO issue and removed Needs Triage Need team to review and classify labels Feb 8, 2023

karthikeyann mentioned this issue Feb 13, 2023

Adds JSON reader, writer io benchmark #12753

Merged

3 tasks

vuule added the good first issue Good for newcomers label Apr 5, 2023

GregoryKimball mentioned this issue Jun 27, 2023

[FEA] Add Parquet and ORC unit tests based on Apache sample files #13627

Open

GregoryKimball added this to the Benchmarking milestone Jul 23, 2023

GregoryKimball removed this from libcudf Oct 26, 2023

SurajAralihalli mentioned this issue Oct 30, 2023

Convert compression and io to string axis type in IO benchmarks #14347

Merged

4 tasks

SurajAralihalli mentioned this issue Nov 15, 2023

Convert compression and io to string axis type in Parquet Reader benchmarks #14418

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Update IO benchmarks for consistency between formats #12739

[FEA] Update IO benchmarks for consistency between formats #12739

GregoryKimball commented Feb 8, 2023 •

edited

Loading

[FEA] Update IO benchmarks for consistency between formats #12739

[FEA] Update IO benchmarks for consistency between formats #12739

Comments

GregoryKimball commented Feb 8, 2023 • edited Loading

GregoryKimball commented Feb 8, 2023 •

edited

Loading