-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug/union wrong casting #5342
Bug/union wrong casting #5342
Conversation
@berkaysynnada Really thanks for your work! I wonder what the type of the union of an Int64 and an UInt64 should be though. |
Once you hit the widest fixed-size type, information loss becomes inevitable unless you have |
cc @liukun4515 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @berkaysynnada -- sorry for the delay in review. This is looking good
(Int32, _) | (_, Int32) => Some(Int32), | ||
(Int16, _) | (_, Int16) => Some(Int16), | ||
(Int8, _) | (_, Int8) => Some(Int8), | ||
// start checking from Int64, that is the most inclusive integer type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we express the same logic more concisely with something like:
(Int64, _) | (_, Int64) => Some(Int64),
(Int32, _) | (_, Int32) => Some(Int64),
(Int16, _) | (_, Int16) => Some(Int32),
(Int8, _) | (_, Int8) => Some(Int16),
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise I think the same pattern could be used for UInt*
variants
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't that map (Int32
, Int16
) to Int64
? I think @berkaysynnada's intent is to map to the narrowest correct type, which in this case would be Int32
.
I am not sure if the match
pattern is the most succinct representation of the table in the PR text though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't that map (Int32, Int16) to Int64?
Yes.
I guess I was thinking of the more general case for arithmetic where i32::MAX + i16::MAX
requires a i64
to store
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused by the sum. Aren't we just "merging" the schemas? In that case, the narrowest type that can represent both sides should be chosen, no? Am I missing something here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I am just confused
For example this PR adds this rule:
(Int8, UInt16) => Some(Int32),
Which isn't just merging the schema (e.g. UInt16) in my mind, but is extending it in some way. In this case, so that the entire range of Int8
and UInt16
can be represented -- which now makes sense
I think what would help me would be:
- Add some comments explaining the rationale for these rules
- (maybe) add test coverage (ideally looking like your beautiful table from the PR description) that helps to illustrate the point
However, I don't think this is needed prior to merging this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just added comments to explain the reasoning of the match
patterns. Also took note to add some tests in a follow-on PR. Thanks for the review!
(Int32, _) | (_, Int32) => Some(Int32), | ||
(Int16, _) | (_, Int16) => Some(Int16), | ||
(Int8, _) | (_, Int8) => Some(Int8), | ||
// start checking from Int64, that is the most inclusive integer type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I am just confused
For example this PR adds this rule:
(Int8, UInt16) => Some(Int32),
Which isn't just merging the schema (e.g. UInt16) in my mind, but is extending it in some way. In this case, so that the entire range of Int8
and UInt16
can be represented -- which now makes sense
I think what would help me would be:
- Add some comments explaining the rationale for these rules
- (maybe) add test coverage (ideally looking like your beautiful table from the PR description) that helps to illustrate the point
However, I don't think this is needed prior to merging this PR
Thank you! |
Benchmark runs are scheduled for baseline = 0000d4f and contender = d076ab3. d076ab3 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #5212.
Rationale for this change
When we try to use "UNION" on columns having differently signed or sized integer types, we get a cast error for some cases (explained in the issue), which can be handled accurately.
What changes are included in this PR?
Previous casting matches are modified such that the resulting type of the column will be the smallest unit that will not cause to any error.
Are these changes tested?
Yes,
test_union_upcast_types
test function is added showing the error is solved in the issue.Are there any user-facing changes?
No.