-
Notifications
You must be signed in to change notification settings - Fork 919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix exponent overflow in strings-to-double conversion #15517
Merged
rapids-bot
merged 7 commits into
rapidsai:branch-24.06
from
davidwendt:stod-overflow-exp
Apr 15, 2024
Merged
Changes from 1 commit
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
6b371ec
Fix exponent overflow in strings-to-double conversion
davidwendt c52480c
remove debug_utilities include
davidwendt 2a3c8a1
Merge branch 'branch-24.06' into stod-overflow-exp
davidwendt 12b8462
change float constant to integer
davidwendt 0755cb3
Merge branch 'branch-24.06' into stod-overflow-exp
davidwendt 66e72ab
Merge branch 'branch-24.06' into stod-overflow-exp
davidwendt 7e21176
update comment
davidwendt File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like there's a logical flaw here. We know the exponent of a finite double value can only be as large as
std::numeric_limits<double>::max_exponent10 == 308
, so why check against100'000'000
? Anything else would go to infinity, unless I'm missing something.Maybe this is what I am expecting to see:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm mistaken about the logic above, let's at least use integer-integer comparisons (1e8 is not an integer).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exceeding
max_exponent
is actually handled below to map toinfinity
(or-infinity
).The check here is to make sure we don't overflow the integer which is UB.
I had assumed the compiler would convert 1e8 to an integer but it appears that is incorrect.
I'll change it to an integer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an
exp_off
to store the extra exponent part of the mantissa, and it will be added toexp_ten
after this while loop. So for a large number (which we expect to be an infinity), if the mantissa is very long and theexp_ten
is not large enough, the finalexp_ten
could be wrong. So we need to pick a limit that is as large as possible.Even if we set it to
100'000'000
, the above case would still happen for a string of length more than 1e8. It's a very edge case, just want to point it out.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E.g.: a very very long number(length is Int.max_value):
0.00...[There are about Int.max_value zeros].....1E999999999
Because of the the following adjustment of
exp_ten
, theexp_len
will be wrong.Propose to use
long
to saveexp_ten
as currently max string length is Int.max_value.And not sure If cuDF will support Long.max_value length string in future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A int.max long string would only be a one row column in libcudf since we have a max column size of int.max right now.
Regardless, I don't feel we need to increase the register size of this function to handle such a case. Likewise, a 100M length string would only be about 20 rows. I think this is a reasonable limit and could even be convinced a lower value is more practical.