Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic validation in reader benchmarks #14647

Merged
merged 17 commits into from
Dec 27, 2023

Conversation

vuule
Copy link
Contributor

@vuule vuule commented Dec 18, 2023

Description

Check the output table shape in the CSV, JSON, ORC and Parquet reader benchmarks.

Other changes:
Fixed some chunking logic in the CSV reader benchmark.
Shortened the lifetime of the original table to reduce peak memory use (adopted the pattern from the JSON reader benchmark).

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@vuule vuule self-assigned this Dec 18, 2023
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Dec 18, 2023
@vuule vuule added tests Unit testing for project improvement Improvement / enhancement to an existing function non-breaking Non-breaking change and removed libcudf Affects libcudf (C++/CUDA) code. labels Dec 18, 2023
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Dec 18, 2023
@vuule vuule marked this pull request as ready for review December 19, 2023 18:28
@vuule vuule requested a review from a team as a code owner December 19, 2023 18:28
@vuule vuule requested review from shrshi and davidwendt December 19, 2023 18:28
Comment on lines +65 to +67
size_t const chunk_size = cudf::util::div_rounding_up_safe(source_sink.size(), num_chunks);
auto const chunk_row_cnt =
cudf::util::div_rounding_up_safe(view.num_rows(), static_cast<cudf::size_type>(num_chunks));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old approach was rounding down and losing some rows. Adding the check uncovered the issue.
Also some logic in the loop got simplified by rounding up here.

Copy link
Member

@PointKernel PointKernel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some non-blocking nits. Otherwise LGTM.

cpp/benchmarks/io/parquet/parquet_reader_input.cpp Outdated Show resolved Hide resolved
cpp/benchmarks/io/parquet/parquet_reader_input.cpp Outdated Show resolved Hide resolved
cpp/benchmarks/io/parquet/parquet_reader_input.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@hyperbolic2346 hyperbolic2346 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of questions...

cpp/benchmarks/io/csv/csv_reader_options.cpp Show resolved Hide resolved
cpp/benchmarks/io/orc/orc_reader_input.cpp Outdated Show resolved Hide resolved
@vuule
Copy link
Contributor Author

vuule commented Dec 27, 2023

/merge

@rapids-bot rapids-bot bot merged commit 72e6f9b into rapidsai:branch-24.02 Dec 27, 2023
67 checks passed
@vuule vuule deleted the bm-basic-validation branch December 27, 2023 21:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change tests Unit testing for project
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants