-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect Decimal Division Coercion #6828
Comments
This scaling should allow precision loss instead of overflow directly. |
The point here is that no overflow should occur, the only reason overflow occurs is because the type coercion machinery isn't respecting the input types. We don't need to perform fixed point multiplication here if we're sensible about preserving the input scale |
Read through the diff quickly.
I think Hmm, for the described case, isn't
So scaling the left hand side to 10e58 won't overflow? Am I missing any point here? 🤔 |
Yeah, I need to continue to iterate on this, it's all still a WIP. But it seems inherently wrong that dividing by a value greater than 1 could result in overflow and requires fixed point multiplication. |
Interestingly duckdb appears to not support decimal division at all
Mysql appears to just increment the left hand scale by 4
Postgres appears to do something similar - https://github.com/postgres/postgres/blob/29cf61ade3f245aa40f427a1d6345287ef77e622/src/interfaces/ecpg/pgtypeslib/numeric.c#L1047 Interestingly the Hive specification states
But then goes on to show a table with something different I think using a fixed increment of the dividend's scale makes a whole lot more sense than a value computed based on the divisor's precision, which just seems to be a recipe for overflow. Edit: I have updated apache/arrow-rs#4465 to do this |
Adding more test cases to be validated on DF Looks like there is a series of bugs on decimal multiply/division in DF, so this qury in PG gives me all the same numbers
in DF division not happening at all, multiply gives diff results
|
The overflow could be fixed after #6832, I think. Currently decimal multiplication with scalar doesn't handle overflow correctly. |
I haven't followed the entire discussion but, FWIW, Substrait currently agrees with the
I'm not sure this bears out empirically. Using https://www.db-fiddle.com/f/4jyoMCicNSZpjMt4jFYoz5/9659 I get the following result from
I think the idea of involving the divisor's precision is for cases like: 1 / 1000000000
Agreed. In both SQL server and Substrait there is a second step to help with the overflow problem. This second step applies to all decimal arithmetic operations. The rule is basically like this: If, after following the formulas, the resulting precision / scale is out of bounds, then sacrifice scale (e.g. throw away the really insignificant digits on the far right) but keep at least 6 digits of scale. If you can't keep at least 6 digits of scale (e.g. the number requires 33 digits or more) then overflow. The 6 is absolutely arbitrary but it works. Note, that the SQL server rule is slightly more complex, but more or less the same thing:
|
Also, if it helps, here are the calcite rules, which should be the same as, and were the motivation for, the substrait rules. |
I've attempted to adjust this in apache/arrow-rs#4640 PTAL |
Describe the bug
Currently when performing decimal division
coercion_decimal_mathematics_type
as called byBinaryExpr::evaluate
will coerce both inputs to the wider precision type. The division kernel will then scale the left hand side by the output scale.To see why this is an issue consider
This computation shouldn't overflow as it can just perform
10e20 / 5
The issue is that the coercion logic will scale the right hand side to
5e20
requiring it to also scale the left hand side to10e40
which will overflowTo Reproduce
Expected behavior
No response
Additional context
#6794
The text was updated successfully, but these errors were encountered: