-
Notifications
You must be signed in to change notification settings - Fork 842
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Divide By Zero Panics in Unchecked Arithmetic Kernels #2647
Comments
I am +1 for that the unchecked variant of these division kernels should do something else instead of panic. For the idea to fill a special value, just wondering if NULL is more suitable/meaningful for this case (Spark does this). Not strong option and also from performance perspective from the proposed optimization, it may not a good choice. For using 0 for dividing by zero slot, not sure if there is any other doing the same so this is my little concern. But I guess it is okay if we well define this behavior in the document. Btw, C++ |
An interesting datapoint I was not aware of is there aren't actually SIMD instructions for integer division, and integer division is sufficiently slow that people do arcane tricks doing floating point division and then compensating for the precision loss instead 😅 Perhaps this means division will always be its own special snowflake and I should just optimize the other kernels as a first step 🤔 |
Ideally, I would expect the division kernel to have the same option as the cast kernel:
I believe this is consistent with what @liukun4515 and @viirya are saying I also think this is consistent with @tustvold 's proposal, but I am 100% sure. |
It isn't what I was proposing but having thought about it a bit more I'm not sure having checked and unchecked variants of the division kernel really makes sense:
As such I would like to propose we only provide a single
There is no CastOption for any of the arithmetic kernels, and I'm somewhat inclined to keep it that way. If people want the "safe" semantic (a name I really dislike) a preceding nullif will likely perform similarly. |
This makes sense to me (and is consistent with what postgres division does):
|
I might be misremembering this, but I think |
Further to this the divide kernel is currently using |
I have confirmed empirically that there is no discernible performance difference between a variant that returns an error only on divide by zero, and a variant that also returns an error on overflow. I therefore think it makes sense to only have checked division and modulus kernels, this is what I plan to do as part of #3999 |
Closed by #4594 |
Which part is this question about
In Rust division and modulus is always checked - you can see that here. We currently provide checked and unchecked division kernels, and a single modulus kernel which is always checked.
The result is that the checked kernels will return an error on divide by zero, whilst the unchecked kernel will panic. This feels a little counter-intuitive, I would expect the unchecked kernel to silently do "something defined but not necessarily meaningful".
Describe your question
The comparison kernels handle the null mask separately from the values data, and this massively improves performance. I would like to apply the same optimisation to the unchecked arithmetic kernels, but cannot do this whilst dividing or modulus by 0 results in a panic, as the null values may have arbitrary values which could trigger this.
I would like to propose the unchecked variant of these division kernels set the value to be 0 on division by 0 (or some other sentinel value). As we're checking regardless there is no performance impact of doing this, and it allows blindly processing the values without first consulting the bitmask.
An alternative option would be to only have checked variants, however, this would preclude handling the null mask separately for the non-scalar variants of the kernels. I think this would be a shame.
One further alternative I did consider was to only consult the bitmask on divide by zero, however, the reality is most null slots will have a value of 0, and consequently this cost of this conditional checking is likely to undermine any benefits from the optimisation.
Additional context
Thoughts @alamb @viirya @Dandandan @jhorstmann ?
The text was updated successfully, but these errors were encountered: