Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] Fix High Precision Rounding #6707

Merged
merged 5 commits into from
Aug 9, 2024
Merged

Conversation

ArnavBalyan
Copy link
Contributor

@ArnavBalyan ArnavBalyan commented Aug 4, 2024

What changes were proposed in this pull request?

Gluten Result: Round(0.19324999999999998, 4) = 0.1933
Vanilla Spark Result: Round(0.19324999999999998, 4) = 0.1932

  • Currently gluten round has std::nextafter to offset the small error coming from multiply result. However, for high precision numbers the nextafter leads to a bigger delta than the expected value.
  • This eventually causes wrong results for high precision rounding (starts occurring beyond 15 decimals).
  • Currently we use double in the best case for intermediate calculations. Now moved to long double for holding the intermediate results more accurately.
  • In the future, we can use boost multi-precision to support arbitrary precision at runtime, which will more closely match the Java BigDecimal and give better performance.

Also fixes: #5366.

How was this patch tested?

  • Unit Tests
  • Local Spark shell

After the fix:
Gluten Result: Round(0.19324999999999998, 4) = 0.1932
Vanilla Spark Result: Round(0.19324999999999998, 4) = 0.1932

The following query was also tested for the attached issue: (This can be flaky since we still use double which will lose some precision)
select round(avg(cast(col as double)), 4) as topic_2 from VALUES (0.188), (0.194), (0.194), (0.194), (0.194), (0.194), (0.194), (0.194) AS tab(col);

@github-actions github-actions bot added CORE works for Gluten Core VELOX labels Aug 4, 2024
Copy link

github-actions bot commented Aug 4, 2024

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/apache/incubator-gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Copy link

github-actions bot commented Aug 4, 2024

Run Gluten Clickhouse CI

@zhztheplayer
Copy link
Member

cc @PHILO-HE @rui-mo

Copy link

github-actions bot commented Aug 5, 2024

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Aug 5, 2024

Run Gluten Clickhouse CI

@ArnavBalyan
Copy link
Contributor Author

ArnavBalyan commented Aug 5, 2024

Hi @zhztheplayer,@PHILO-HE ,@rui-mo could you please help triggering the workflow. Thank you!

@ArnavBalyan ArnavBalyan changed the title [VL] Support High Precision Rounding for Velox [VL] Fix High Precision Rounding for Velox Aug 5, 2024
rui-mo
rui-mo previously approved these changes Aug 6, 2024
Copy link
Contributor

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Looks good to me.

cpp/velox/operators/functions/Arithmetic.h Outdated Show resolved Hide resolved
Copy link

github-actions bot commented Aug 6, 2024

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Aug 6, 2024

Run Gluten Clickhouse CI

@FelixYBW
Copy link
Contributor

FelixYBW commented Aug 7, 2024

What changes were proposed in this pull request?

Gluten Result: Round(0.19324999999999998, 4) = 0.1933
Vanilla Spark Result: Round(0.19324999999999998, 4) = 0.1932

  • Currently gluten round has std::nextafter to offset the small error coming from multiply result. However, for high precision numbers the nextafter leads to a bigger delta than the expected value.
  • This eventually causes wrong results for high precision rounding (starts occurring beyond 15 decimals).
  • Currently we use double in the best case for intermediate calculations. Now moved to long double for holding the intermediate results more accurately.
  • In the future, we can use boost multi-precision to support arbitrary precision at runtime, which will more closely match the Java BigDecimal and give better performance.

Also fixes: #5366.

How was this patch tested?

  • Unit Tests
  • Local Spark shell

After the fix: Gluten Result: Round(0.19324999999999998, 4) = 0.1932 Vanilla Spark Result: Round(0.19324999999999998, 4) = 0.1932

The following query was also tested for the attached issue: (This can be flaky since we still use double which will lose some precision) select round(avg(cast(col as double)), 4) as topic_2 from VALUES (0.188), (0.194), (0.194), (0.194), (0.194), (0.194), (0.194), (0.194) AS tab(col);

Excellent description!

@FelixYBW
Copy link
Contributor

FelixYBW commented Aug 7, 2024

@kecookier can you confirm if the PR can fix your issue? Looks good to me

@@ -121,6 +121,9 @@ class GlutenMathExpressionsSuite extends MathExpressionsSuite with GlutenTestsTr
checkEvaluation(Round(-3.5, 0), -4.0)
checkEvaluation(Round(-0.35, 1), -0.4)
checkEvaluation(Round(-35, -1), -40)
checkEvaluation(Round(1.12345678901234567, 8), 1.12345679)
checkEvaluation(Round(-0.98765432109876543, 5), -0.98765)
checkEvaluation(Round(12345.67890123456789, 6), 12345.678901)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering if there is any reason for removing below case from the test. Is it due to some limitation?

checkEvaluation(Round(0.19324999999999998, 4), 0.1932)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized next_after can return different results on different machines or compilers. In this case, the intermediate round result is almost on the edge of being rounded to the next decimal (difference was almost negligible), so removed to prevent flakiness. I think we seem to be using double as an alternative to java bigdecimal at multiple places. We will probably eventually have to move to boost/mpfr otherwise might see difference in results compared to vanilla spark. I think it's not a big problem now, but maybe in the future. Wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying. I see your point.

I think we seem to be using double as an alternative to java bigdecimal at multiple places.

Just curious. Have you noticed other issues except for the round-related ones?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, so far only seen with round. I'll continue to explore if there are any other cases like this

@rui-mo rui-mo changed the title [VL] Fix High Precision Rounding for Velox [VL] Fix High Precision Rounding Aug 9, 2024
@rui-mo
Copy link
Contributor

rui-mo commented Aug 9, 2024

@kecookier If there is any other relevant issue please feel free to comment.

@rui-mo rui-mo merged commit 920cfaf into apache:main Aug 9, 2024
46 checks passed
@kecookier
Copy link
Contributor

Thanks, I'll test it.

weiting-chen pushed a commit to weiting-chen/gluten that referenced this pull request Aug 12, 2024
@jiangjiangtian
Copy link
Contributor

jiangjiangtian commented Aug 13, 2024

@rui-mo I found an issue, but I can't tell whether the issue is from round.

SELECT round(3/32*CAST(5.92 AS DECIMAL(20,2)),2);

gluten returns 0.56, but vanilla returns 0.55.
The SQLs below have the same results between gluten and vanilla, which are 0.5549999999999999 and 0.55:

SELECT 3/32*CAST(5.92 AS DECIMAL(20,2));
SELECT round(0.5549999999999999, 2);

@jiangjiangtian
Copy link
Contributor

@rui-mo I found an issue, but I can't tell whether the issue is from round.

SELECT round(3/32*CAST(5.92 AS DECIMAL(20,2)),2);

gluten returns 0.56, but vanilla returns 0.55. The SQLs below have the same results between gluten and vanilla, which are 0.5549999999999999 and 0.55:

SELECT 3/32*CAST(5.92 AS DECIMAL(20,2));
SELECT round(0.5549999999999999, 2);

I find that the argument passed to round is 0.555. I will check it.

@ArnavBalyan
Copy link
Contributor Author

ArnavBalyan commented Aug 13, 2024

@rui-mo I found an issue, but I can't tell whether the issue is from round.

SELECT round(3/32*CAST(5.92 AS DECIMAL(20,2)),2);

gluten returns 0.56, but vanilla returns 0.55. The SQLs below have the same results between gluten and vanilla, which are 0.5549999999999999 and 0.55:

SELECT 3/32*CAST(5.92 AS DECIMAL(20,2));
SELECT round(0.5549999999999999, 2);

I find that the argument passed to round is 0.555. I will check it.

I think there will always be an issue with decimals because of how c++ represents floating point, there is a negligible loss when representing high precision using double. Java's bigdecimal does not pose this and can scale upto whatever the memory will support with dynamic precision. As long as we continue to use double in c++ as an alternative to bigdecimal, we will see issues with arithmetic. One of the long term fixes could be using mpfr/boost throughout the code but it might require significant effort. Thanks!

weiting-chen added a commit that referenced this pull request Aug 13, 2024
* [VL] Skip UTF-8 validation in JSON parsing (#6661)

* [VL] Fix high precision rounding (#6707)

---------

Co-authored-by: PHILO-HE <[email protected]>
Co-authored-by: Arnav Balyan <[email protected]>
@jiangjiangtian
Copy link
Contributor

@rui-mo I found an issue, but I can't tell whether the issue is from round.

SELECT round(3/32*CAST(5.92 AS DECIMAL(20,2)),2);

gluten returns 0.56, but vanilla returns 0.55. The SQLs below have the same results between gluten and vanilla, which are 0.5549999999999999 and 0.55:

SELECT 3/32*CAST(5.92 AS DECIMAL(20,2));
SELECT round(0.5549999999999999, 2);

I find that the argument passed to round is 0.555. I will check it.

I think there will always be an issue with decimals because of how c++ represents floating point, there is a negligible loss when representing high precision using double. Java's bigdecimal does not pose this and can scale upto whatever the memory will support with dynamic precision. As long as we continue to use double in c++ as an alternative to bigdecimal, we will see issues with arithmetic. One of the long term fixes could be using mpfr/boost throughout the code but it might require significant effort. Thanks!

Thanks for your reply!
I have a minimal example that doesn't produce the right answer:

SELECT round(cast(0.5549999999999999 as double), 2);

gluten returns 0.56, vanilla returns 0.55.
Besides, I run the following SQL and I get 0.1933:

SELECT round(cast(0.19324999999999998 AS DOUBLE), 4);

@rui-mo
Copy link
Contributor

rui-mo commented Aug 14, 2024

@jiangjiangtian Would you like to open an issue in Gluten with a repro? Perhaps also add to the issue tracker #4652.

@ArnavBalyan
Copy link
Contributor Author

@jiangjiangtian could you please also send your hardware/os specs, I'll double check the change and try to fix it, thanks for testing this!

@jiangjiangtian
Copy link
Contributor

@jiangjiangtian could you please also send your hardware/os specs, I'll double check the change and try to fix it, thanks for testing this!

OS is RHEL 8.1(4.18.0-147).
CPU is Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz.
gcc is g++ (GCC) 10.2.1 20210130 (Red Hat 10.2.1-11).
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CORE works for Gluten Core VELOX
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[VL] Results are mismatch with Vanilla Spark when round(avg(cast(col as double)), 4) in release-1.1
6 participants