-
Notifications
You must be signed in to change notification settings - Fork 453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VL] Fix High Precision Rounding #6707
Conversation
Thanks for opening a pull request! Could you open an issue for this pull request on Github Issues? https://github.com/apache/incubator-gluten/issues Then could you also rename commit message and pull request title in the following format?
See also: |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Hi @zhztheplayer,@PHILO-HE ,@rui-mo could you please help triggering the workflow. Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Looks good to me.
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Excellent description! |
@kecookier can you confirm if the PR can fix your issue? Looks good to me |
@@ -121,6 +121,9 @@ class GlutenMathExpressionsSuite extends MathExpressionsSuite with GlutenTestsTr | |||
checkEvaluation(Round(-3.5, 0), -4.0) | |||
checkEvaluation(Round(-0.35, 1), -0.4) | |||
checkEvaluation(Round(-35, -1), -40) | |||
checkEvaluation(Round(1.12345678901234567, 8), 1.12345679) | |||
checkEvaluation(Round(-0.98765432109876543, 5), -0.98765) | |||
checkEvaluation(Round(12345.67890123456789, 6), 12345.678901) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering if there is any reason for removing below case from the test. Is it due to some limitation?
checkEvaluation(Round(0.19324999999999998, 4), 0.1932)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized next_after can return different results on different machines or compilers. In this case, the intermediate round result is almost on the edge of being rounded to the next decimal (difference was almost negligible), so removed to prevent flakiness. I think we seem to be using double as an alternative to java bigdecimal at multiple places. We will probably eventually have to move to boost/mpfr otherwise might see difference in results compared to vanilla spark. I think it's not a big problem now, but maybe in the future. Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying. I see your point.
I think we seem to be using double as an alternative to java bigdecimal at multiple places.
Just curious. Have you noticed other issues except for the round-related ones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, so far only seen with round. I'll continue to explore if there are any other cases like this
@kecookier If there is any other relevant issue please feel free to comment. |
Thanks, I'll test it. |
@rui-mo I found an issue, but I can't tell whether the issue is from round.
gluten returns 0.56, but vanilla returns 0.55.
|
I find that the argument passed to round is 0.555. I will check it. |
I think there will always be an issue with decimals because of how c++ represents floating point, there is a negligible loss when representing high precision using |
* [VL] Skip UTF-8 validation in JSON parsing (#6661) * [VL] Fix high precision rounding (#6707) --------- Co-authored-by: PHILO-HE <[email protected]> Co-authored-by: Arnav Balyan <[email protected]>
Thanks for your reply!
gluten returns 0.56, vanilla returns 0.55.
|
@jiangjiangtian Would you like to open an issue in Gluten with a repro? Perhaps also add to the issue tracker #4652. |
@jiangjiangtian could you please also send your hardware/os specs, I'll double check the change and try to fix it, thanks for testing this! |
OS is RHEL 8.1(4.18.0-147). |
What changes were proposed in this pull request?
Also fixes: #5366.
How was this patch tested?
After the fix:
Gluten Result: Round(0.19324999999999998, 4) = 0.1932
Vanilla Spark Result: Round(0.19324999999999998, 4) = 0.1932
The following query was also tested for the attached issue: (This can be flaky since we still use double which will lose some precision)
select round(avg(cast(col as double)), 4) as topic_2 from VALUES (0.188), (0.194), (0.194), (0.194), (0.194), (0.194), (0.194), (0.194) AS tab(col);