Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add validation to reader benchmarks #14137

Closed
wants to merge 10 commits into from

Conversation

vuule
Copy link
Contributor

@vuule vuule commented Sep 20, 2023

Description

Validate that cuIO readers successfully round trip the input tables.
Validation is only done in the first iteration, and it is not included in the timing.
If there is a difference, a warning is logged.
Setting CUDF_BENCH_OUTPUT_DIFF environment variable adds diff to the standard output. Valid values are FIRST_ERROR and ALL_ERRORS.

Benchmarks currently report differences for data types that can't be preserved through the given formats. There are not differences caused by data corruption.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@vuule vuule added tests Unit testing for project improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Sep 20, 2023
@vuule vuule self-assigned this Sep 20, 2023
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Sep 20, 2023
@vuule vuule changed the title Fea benchmark validation Add validation to reader benchmarks Sep 20, 2023
@vuule vuule marked this pull request as ready for review September 20, 2023 16:56
@vuule vuule requested a review from a team as a code owner September 20, 2023 16:56
@vuule vuule requested review from vyasr and divyegala September 20, 2023 16:56
Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation seems sensible, but why are we doing this in benchmarks rather than tests?

@vuule
Copy link
Contributor Author

vuule commented Sep 22, 2023

The implementation seems sensible, but why are we doing this in benchmarks rather than tests?

To verify that the benchmarks are doing what they're supposed to. Also, it's helpful in situations where we benchmark different configurations (thread pool size, GDS policy, compression policy...). Basically enabling us to benchmark changes without jumping back to tests for each change.

@vuule vuule closed this Dec 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change tests Unit testing for project
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants