Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Approximate overflow detection in ORC statistics #9163

Merged
merged 3 commits into from
Sep 2, 2021

Conversation

vuule
Copy link
Contributor

@vuule vuule commented Sep 1, 2021

Closes #9136

When converting statistics chunks, has_sum is conditioned on the result of overflow detection. Detection is very pessimistic so sum is not included is all cases where there's a chance of overflow based on min/max values in the column.

@vuule vuule added bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. cuIO cuIO issue non-breaking Non-breaking change labels Sep 1, 2021
@vuule vuule self-assigned this Sep 1, 2021
@codecov
Copy link

codecov bot commented Sep 1, 2021

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.10@1935a8a). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##             branch-21.10    #9163   +/-   ##
===============================================
  Coverage                ?   10.82%           
===============================================
  Files                   ?      115           
  Lines                   ?    19125           
  Branches                ?        0           
===============================================
  Hits                    ?     2070           
  Misses                  ?    17055           
  Partials                ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1935a8a...f0a228c. Read the comment docs.

@vuule vuule marked this pull request as ready for review September 1, 2021 23:43
@vuule vuule requested a review from a team as a code owner September 1, 2021 23:43
@vuule
Copy link
Contributor Author

vuule commented Sep 2, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 58467fa into rapidsai:branch-21.10 Sep 2, 2021
@vuule vuule deleted the bug-stats-overflow branch April 20, 2022 23:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] to_orc writes incorrect sum statistics when there's an overflow
3 participants