Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update default data source in cuio reader benchmarks #12740

Merged

Conversation

PointKernel
Copy link
Member

@PointKernel PointKernel commented Feb 8, 2023

Closes #12678

This PR sets the default data source in orc, parquet and csv reader benchmarks to DEVICE_BUFFER.

Description

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@PointKernel PointKernel added feature request New feature or request 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. cuIO cuIO issue non-breaking Non-breaking change labels Feb 8, 2023
@PointKernel PointKernel self-assigned this Feb 8, 2023
@PointKernel PointKernel requested a review from a team as a code owner February 8, 2023 23:06
@PointKernel PointKernel requested review from elstehle and vuule February 8, 2023 23:06
@@ -72,7 +72,7 @@ void BM_orc_read_data(nvbench::state& state, nvbench::type_list<nvbench::enum_ty
data_profile_builder().cardinality(cardinality).avg_run_length(run_length));
auto const view = tbl->view();

cuio_source_sink_pair source_sink(io_type::HOST_BUFFER);
cuio_source_sink_pair source_sink(io_type::FILEPATH);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooh, I didn't fully comprehend the proposed change until now.
I would argue that HOST_BUFFER is a better choice here (device buffer is the best option IMO) because this set of benchmarks is meant to benchmark only the decode part of the reader (the name is "orc_read_decode"). I understand that this leaves us with a gap in benchmark coverage (different types + IO). maybe that should either be a separate benchmark or an option.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ttnghia
Copy link
Contributor

ttnghia commented Feb 9, 2023

But the issue#12678 says:

We considered choosing FILEPATH or HOST_BUFFER for all of these benchmarks, but DEVICE_BUFFER is best for focusing on decoding.

@PointKernel PointKernel changed the title Change default data source type in orc reader benchmark Update default data source in cuio reader benchmarks Feb 9, 2023
@PointKernel PointKernel requested a review from vuule February 9, 2023 22:50
@PointKernel PointKernel requested a review from ttnghia February 10, 2023 14:38
@PointKernel
Copy link
Member Author

/merge

@rapids-bot rapids-bot bot merged commit c931d5a into rapidsai:branch-23.04 Feb 10, 2023
@PointKernel PointKernel deleted the orc-reader-default-data-source branch February 10, 2023 16:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Switch orc_read_decode, parquet_read_decode and csv_read_input to use DEVICE_BUFFER data source
3 participants