You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
In our compatibility docs we explicitly say that round and bround can be off in many cases. But I think we can detect and fix these cases for the most part, or at least in a lot of cases.
Spark when they do rounding will convert the float/double to a Decimal number with unlimited precision and then round it to the given scale and convert the answer back. This is fine and all, but it is also really slow, and can need up to 128 bytes to hold the number. In almost all cases it looks like the round becomes a noop on the CPU. The output is exactly the same as the input. For us we end up doing a bunch of floating point math and get a result that is very close, but slightly off from that Spark CPU gets.
It appears to happen when the exponent of the floating point number - the scale we are rounding to is larger than what the fraction could hold. For floats that is about 24-bit (8 digits) and for doubles it is about 53-bits (16 digits)
I don't think this would fix everything in all cases, but it should get us a lot closer, and feels like something we could actually do.
Steps/Code to reproduce bug
I applied this patch and found at least one row that was off for one floating point test that was larger than what approximate float thought was good. I ma sure there are a lot more.
Describe the bug
In our compatibility docs we explicitly say that round and bround can be off in many cases. But I think we can detect and fix these cases for the most part, or at least in a lot of cases.
https://github.com/NVIDIA/spark-rapids/blob/branch-23.10/docs/compatibility.md#floating-point
Spark when they do rounding will convert the float/double to a Decimal number with unlimited precision and then round it to the given scale and convert the answer back. This is fine and all, but it is also really slow, and can need up to 128 bytes to hold the number. In almost all cases it looks like the round becomes a noop on the CPU. The output is exactly the same as the input. For us we end up doing a bunch of floating point math and get a result that is very close, but slightly off from that Spark CPU gets.
It appears to happen when the exponent of the floating point number - the scale we are rounding to is larger than what the fraction could hold. For floats that is about 24-bit (8 digits) and for doubles it is about 53-bits (16 digits)
I don't think this would fix everything in all cases, but it should get us a lot closer, and feels like something we could actually do.
Steps/Code to reproduce bug
I applied this patch and found at least one row that was off for one floating point test that was larger than what approximate float thought was good. I ma sure there are a lot more.
The text was updated successfully, but these errors were encountered: