Avoid overflow for fixed_point round #9809

sperlingxx · 2021-12-01T10:03:33Z

Current PR is to get rid of the potential overflow on fixed_point round caused by too large scale movement.

If the scale movement is larger than the max precision of current type, the pow operation (std::pow(10, scale_movement)) will produce an overflow number as current type. Fortunately, we can simply output a zero column because no digits can survive such a large scale movement.

Signed-off-by: sperlingxx <[email protected]>

codecov · 2021-12-01T11:38:17Z

Codecov Report

Merging #9809 (ceb1c27) into branch-22.02 (967a333) will decrease coverage by 0.05%.
The diff coverage is n/a.

@@               Coverage Diff                @@
##           branch-22.02    #9809      +/-   ##
================================================
- Coverage         10.49%   10.43%   -0.06%     
================================================
  Files               119      119              
  Lines             20305    20458     +153     
================================================
+ Hits               2130     2134       +4     
- Misses            18175    18324     +149

Impacted Files	Coverage Δ
python/custreamz/custreamz/kafka.py	`29.16% <0.00%> (-0.63%)`	⬇️
python/dask_cudf/dask_cudf/sorting.py	`92.30% <0.00%> (-0.61%)`	⬇️
python/cudf/cudf/__init__.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/core/frame.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/core/index.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/io/parquet.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/core/series.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/utils/utils.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/utils/dtypes.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/utils/ioutils.py	`0.00% <0.00%> (ø)`
... and 19 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3b4f903...ceb1c27. Read the comment docs.

cpp/src/round/round.cu

jrhemstad · 2021-12-01T17:50:07Z

@sperlingxx please update the PR description with how the current rounding logic is deficient and how it is being improved.

cpp/src/round/round.cu

sperlingxx · 2021-12-02T10:04:52Z

Hi @revans2, I ran the benchmark via Java tests. Current implementation didn't outperform the original one. Here is the detailed results:

original impl:

1000 times average DECIMAL32 HALF_UP : 251 MicroSec.
1000 times average DECIMAL32 HALF_EVEN : 253 MicroSec.
1000 times average DECIMAL64 HALF_UP : 492 MicroSec.
1000 times average DECIMAL64 HALF_EVEN : 494 MicroSec.
1000 times average DECIMAL128 HALF_UP : 975 MicroSec.
1000 times average DECIMAL128 HALF_EVEN : 975 MicroSec.

current impl:

1000 times average DECIMAL32 HALF_UP : 252 MicroSec.
1000 times average DECIMAL32 HALF_EVEN : 252 MicroSec.
1000 times average DECIMAL64 HALF_UP : 493 MicroSec.
1000 times average DECIMAL64 HALF_EVEN : 494 MicroSec.
1000 times average DECIMAL128 HALF_UP : 976 MicroSec.
1000 times average DECIMAL128 HALF_EVEN : 977 MicroSec.

Test script is printed as below:

  @Test
  void decimalRoundPerfTest() {
    int[] dec32Raw = new int[1 << 24];
    Arrays.fill(dec32Raw, 1555555555);
    long startTime, endTime;
    long avgCostHalfUp, avgCostHalfEven;
    avgCostHalfUp = 0L;
    avgCostHalfEven = 0L;
    try (ColumnVector input = ColumnVector.decimalFromInts(-2, dec32Raw)) {
      for (int i = 0; i < 1000; i++) {
        startTime = System.nanoTime();
        try (ColumnVector halfUp = input.round(-5, RoundMode.HALF_UP)) {
          endTime = System.nanoTime();
          avgCostHalfUp += (endTime - startTime) / 1000L;
        }
        startTime = System.nanoTime();
        try (ColumnVector halfEven = input.round(-5, RoundMode.HALF_EVEN)) {
          endTime = System.nanoTime();
          avgCostHalfEven += (endTime - startTime) / 1000L;
        }
      }
    }
    System.err.println("1000 times average DECIMAL32 HALF_UP : " + (avgCostHalfUp / 1000L) + " MicroSec.");
    System.err.println("1000 times average DECIMAL32 HALF_EVEN : " + (avgCostHalfEven / 1000L) + " MicroSec.");

    long[] dec64Raw = new long[1 << 24];
    Arrays.fill(dec64Raw, 1555555555555555555L);
    avgCostHalfUp = 0L;
    avgCostHalfEven = 0L;
    try (ColumnVector input = ColumnVector.decimalFromLongs(-2, dec64Raw)) {
      for (int i = 0; i < 1000; i++) {
        startTime = System.nanoTime();
        try (ColumnVector halfUp = input.round(-14, RoundMode.HALF_UP)) {
          endTime = System.nanoTime();
          avgCostHalfUp += (endTime - startTime) / 1000L;
        }
        startTime = System.nanoTime();
        try (ColumnVector halfEven = input.round(-14, RoundMode.HALF_EVEN)) {
          endTime = System.nanoTime();
          avgCostHalfEven += (endTime - startTime) / 1000L;
        }
      }
    }
    System.err.println("1000 times average DECIMAL64 HALF_UP : " + (avgCostHalfUp / 1000L) + " MicroSec.");
    System.err.println("1000 times average DECIMAL64 HALF_EVEN : " + (avgCostHalfEven / 1000L) + " MicroSec.");

    BigInteger[] dec128Raw = new BigInteger[1 << 24];
    Arrays.fill(dec128Raw, new BigInteger("-155555555555555555555555555555555555"));
    avgCostHalfUp = 0L;
    avgCostHalfEven = 0L;
    try (ColumnVector input = ColumnVector.decimalFromBigInt(-2, dec128Raw)) {
      for (int i = 0; i < 1000; i++) {
        startTime = System.nanoTime();
        try (ColumnVector halfUp = input.round(-30, RoundMode.HALF_UP)) {
          endTime = System.nanoTime();
          avgCostHalfUp += (endTime - startTime) / 1000L;
        }
        startTime = System.nanoTime();
        try (ColumnVector halfEven = input.round(-30, RoundMode.HALF_EVEN)) {
          endTime = System.nanoTime();
          avgCostHalfEven += (endTime - startTime) / 1000L;
        }
      }
    }
    System.err.println("1000 times average DECIMAL128 HALF_UP : " + (avgCostHalfUp / 1000L) + " MicroSec.");
    System.err.println("1000 times average DECIMAL128 HALF_EVEN : " + (avgCostHalfEven / 1000L) + " MicroSec.");
  }

According to the result, I think I should revert most of the change, only keeping the bypass for scale movements which are out of range.

sperlingxx · 2021-12-02T10:42:49Z

@sperlingxx please update the PR description with how the current rounding logic is deficient and how it is being improved.

Hi @jrhemstad, I have reverted the "improvements", since it doesn't really boost the performance, according to the benchmark tests.

jrhemstad · 2021-12-02T14:40:17Z

Why are we making overflow return 0 instead of just letting it overflow? We don't do that for any other type. Or with fixed point we could throw an exception since we can detect overflow from the scales alone.

revans2 · 2021-12-02T19:58:37Z

Why are we making overflow return 0 instead of just letting it overflow? We don't do that for any other type. Or with fixed point we could throw an exception since we can detect overflow from the scales alone.

Because Spark will overflow for ints but not decimal :). We ran into some issues with rounding decimals and put in a fix. We thought it would be good to push the changes back, but if it is going to cause issues with consistency we can keep our current solution.

Technically 0 is the correct answer. If I want to round a DECIMAL32 with a scale of -50 to a scale of 0 it does not matter what the value stored in the decimal is, the result will always be 0. It just happens that the algorithm before would overflow at about this same time. The comment probably should be updated to reflect this instead of just calling out overflow, but that is moot if we are not going to do this change.

For integer types we could run into a similar situation. i.e. I have an INT8 and I ask to round to 5 digits the result would always be 0, but the Spark code overflows in exactly the same way as the CUDF code does. So, I would prefer not to change it.

For floating point rounding is handled by a c function and it looks like if you try to round to a decimal place that is larger than the output can support CUDF returns a NaN while Spark returns a 0.0. So it is kind of a corner case here too.

revans2 · 2021-12-02T20:01:20Z

Actually for integers it is not always 100% the same as with Spark. But it is close enough that we are okay with it.

codereport · 2021-12-07T15:46:26Z

Technically 0 is the correct answer. If I want to round a DECIMAL32 with a scale of -50 to a scale of 0 it does not matter what the value stored in the decimal is, the result will always be 0.

Could you elaborate on this? If you have -50 (or 50 in libcudf) you end up with something like 0.000...0123 which will round to 0 when converted to scale 0. However, if you look at the tests added by @sperlingxx, we are going from scale 1 with values like 1.4 and 2.5 to scale 11 with values 0. That seems odd to me.

revans2 · 2021-12-07T17:11:04Z

Technically 0 is the correct answer. If I want to round a DECIMAL32 with a scale of -50 to a scale of 0 it does not matter what the value stored in the decimal is, the result will always be 0.

Could you elaborate on this?

My example was a bad one. The following should hopefully be much more concrete. Note that this is a corner case in Spark.

The issue we are actually trying to fix/push back is https://github.com/NVIDIA/spark-rapids/blob/1e95655d7fee525bfb12dd445118ba18f5b18296/sql-plugin/src/main/scala/com/nvidia/spark/rapids/DecimalUtil.scala#L71-L99

cudf/cpp/src/round/round.cu

Line 255 in 8c82d6a

Type const n = std::pow(10, std::abs(decimal_places + input.type().scale()));

corresponds directly to the check we are making in the Scala code. For example, if my input is a Decimal32, which can hold up to 9 decimal places without overflowing. And I want to go from a Spark Decimal(9, 9) which corresponds to a CUDF Decimal32 scale -9 and round to a Spark Decimal(1, -1) which corresponds to a CUDF Decimal32 scale 1. Then n overflows because it is std::pow(10, abs(-1 + -9)).

Without the fix.

val df = Seq("0.111111111", "0.999999999", "0.987654321").toDF("a")
spark.conf.set("spark.sql.legacy.allowNegativeScaleOfDecimal", "true")
df.repartition(1).selectExpr("round(CAST(a AS Decimal(9, 9)), -1) as v").show()
+---+
|  v|
+---+
| 10|
| 10|
| 10|
+---+

The CPU shows these as 0, which is what I am arguing should be correct.

If I change the precision to be 10, so it ends up as a Decimal64 instead of a Decimal32 n no longer overflows and we get the same answer as the CPU 0.

What @sperlingxx found is that my code is in-efficient and we don't need to truncate the number before rounding, because the result will always be 0.

If you have -50 (or 50 in libcudf) you end up with something like 0.000...0123 which will round to 0 when converted to scale 0. However, if you look at the tests added by @sperlingxx, we are going from scale 1 with values like 1.4 and 2.5 to scale 11 with values 0. That seems odd to me.

I think the scale may be opposite of what you have described (Spark scale vs CUDF scale. If I messed it up somewhere I apologize). The test that is added is going from values like 140 and 250 where the last digit is always 0 because it is scale 1 (i.e. 14 * 10^1 and 25 * 10^1, to something where the decimal_places is -11 which means that the last 11 digits will always be 0.

codereport · 2021-12-07T19:06:50Z

I think the scale may be opposite of what you have described (Spark scale vs CUDF scale. If I messed it up somewhere I apologize). The test that is added is going from values like 140 and 250 where the last digit is always 0 because it is scale 1 (i.e. 14 * 10^1 and 25 * 10^1, to something where the decimal_places is -11 which means that the last 11 digits will always be 0.

You are correct, I got confused - not you. So it is 140 and 250 turning into 0s when scale goes from 1 to -11. Which will overflow, so I'm still not convinced on the C++ that we would expect 0s. My gut tells me that we would inherit whatever the underlying ints behaviour is (aka overflow).

@revans2 Why does the CPU code result in zeros? I assume this happens on the Spark side of things?

revans2 · 2021-12-09T15:22:04Z

Spark is using java's BigDecimal to do the rounding for all of the number types. BigDecimal does not care about decimal64 vs decimal32. It computes an answer with effectively unlimited precision and then casts the result back to the desired type.

revans2 · 2021-12-09T15:49:02Z

If CUDF just wants to overflow because it is more C++ like we can just add some comments to the APIs explaining when they will overflow. I just find it rather odd that the C++ thing to do is to return an answer that is patently wrong instead of one that is arguably correct, especially when there is no added cost to doing so. That just feels counter to any good API design in general.

codereport · 2021-12-13T00:41:47Z

If CUDF just wants to overflow because it is more C++ like we can just add some comments to the APIs explaining when they will overflow. I just find it rather odd that the C++ thing to do is to return an answer that is patently wrong instead of one that is arguably correct, especially when there is no added cost to doing so. That just feels counter to any good API design in general.

@jrhemstad @harrism What do you think?

jrhemstad · 2021-12-13T17:25:34Z

If CUDF just wants to overflow because it is more C++ like we can just add some comments to the APIs explaining when they will overflow. I just find it rather odd that the C++ thing to do is to return an answer that is patently wrong instead of one that is arguably correct, especially when there is no added cost to doing so. That just feels counter to any good API design in general.

@jrhemstad @harrism What do you think?

I don't have strong feelings either way. fixed_point is unique in that we can detect overflow from the host and there isn't any performance cost to do something special. Overflow of signed integers is UB anyways, so returning 0 is perfectly fine. There is some maintenance cost, but it's minor, so I'm fine with this.

codereport

Change is fine with me 👍

codereport · 2021-12-22T18:49:26Z

cpp/src/round/round.cu

-                    input.end<Type>(),
-                    out_view.begin<Type>(),
-                    FixedPointRoundFunctor{n});
+  constexpr int max_precision = cuda::std::numeric_limits<Type>::digits10;


Does this need to be stored as a variable when it is only used in one place?

No, it doesn't. I removed it.

sperlingxx · 2021-12-30T06:14:59Z

rerun tests

sperlingxx · 2022-01-10T10:17:35Z

@gpucibot merge

Signed-off-by: sperlingxx <[email protected]> Closes #3793 Pushes cuDF-related decimal utilities down to cuDF. This PR is relied on cuDF changes: rapidsai/cudf#9809, rapidsai/cudf#9907 and rapidsai/cudf#10316.

sperlingxx added 3 commits November 30, 2021 13:40

init

8137134

Signed-off-by: sperlingxx <[email protected]>

Merge remote-tracking branch 'origin/branch-22.02' into opt_dec_round

293b558

update

6cc0474

sperlingxx requested a review from a team as a code owner December 1, 2021 10:03

sperlingxx requested review from robertmaynard and vuule December 1, 2021 10:03

github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Dec 1, 2021

sperlingxx added 4 - Needs Review Waiting for reviewer to review or respond non-breaking Non-breaking change improvement Improvement / enhancement to an existing function and removed libcudf Affects libcudf (C++/CUDA) code. labels Dec 1, 2021

sperlingxx requested a review from revans2 December 1, 2021 10:05

sperlingxx mentioned this pull request Dec 1, 2021

[FEA] Push decimal workarounds/cleanup back to cudf NVIDIA/spark-rapids#3793

Closed

davidwendt reviewed Dec 1, 2021

View reviewed changes

cpp/src/round/round.cu Outdated Show resolved Hide resolved

jrhemstad requested a review from codereport December 1, 2021 17:47

codereport suggested changes Dec 1, 2021

View reviewed changes

cpp/src/round/round.cu Outdated Show resolved Hide resolved

revans2 reviewed Dec 1, 2021

View reviewed changes

cpp/src/round/round.cu Outdated Show resolved Hide resolved

update

79fee12

github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Dec 2, 2021

revert

fd098fe

sperlingxx changed the title ~~Improve round for fixed_point types~~ Avoid overflow for fixed_point round Dec 2, 2021

sperlingxx requested a review from codereport December 16, 2021 02:05

codereport approved these changes Dec 22, 2021

View reviewed changes

sperlingxx added 2 commits December 23, 2021 15:30

update

ac9e0d9

Merge remote-tracking branch 'origin/branch-22.02' into opt_dec_round

e15b0cb

Merge remote-tracking branch 'origin/branch-22.02' into opt_dec_round

ceb1c27

rapids-bot bot merged commit 8ba8774 into rapidsai:branch-22.02 Jan 10, 2022

sperlingxx deleted the opt_dec_round branch January 10, 2022 10:17

sperlingxx mentioned this pull request Feb 18, 2022

Push decimal workarounds to cuDF NVIDIA/spark-rapids#4822

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid overflow for fixed_point round #9809

Avoid overflow for fixed_point round #9809

sperlingxx commented Dec 1, 2021 •

edited

Loading

codecov bot commented Dec 1, 2021 •

edited

Loading

jrhemstad commented Dec 1, 2021

sperlingxx commented Dec 2, 2021 •

edited

Loading

sperlingxx commented Dec 2, 2021 •

edited

Loading

jrhemstad commented Dec 2, 2021

revans2 commented Dec 2, 2021

revans2 commented Dec 2, 2021

codereport commented Dec 7, 2021

revans2 commented Dec 7, 2021

codereport commented Dec 7, 2021

revans2 commented Dec 9, 2021

revans2 commented Dec 9, 2021

codereport commented Dec 13, 2021

jrhemstad commented Dec 13, 2021

codereport left a comment

codereport Dec 22, 2021

sperlingxx Dec 23, 2021

sperlingxx commented Dec 30, 2021

sperlingxx commented Jan 10, 2022

Avoid overflow for fixed_point round #9809

Avoid overflow for fixed_point round #9809

Conversation

sperlingxx commented Dec 1, 2021 • edited Loading

codecov bot commented Dec 1, 2021 • edited Loading

Codecov Report

jrhemstad commented Dec 1, 2021

sperlingxx commented Dec 2, 2021 • edited Loading

sperlingxx commented Dec 2, 2021 • edited Loading

jrhemstad commented Dec 2, 2021

revans2 commented Dec 2, 2021

revans2 commented Dec 2, 2021

codereport commented Dec 7, 2021

revans2 commented Dec 7, 2021

codereport commented Dec 7, 2021

revans2 commented Dec 9, 2021

revans2 commented Dec 9, 2021

codereport commented Dec 13, 2021

jrhemstad commented Dec 13, 2021

codereport left a comment

Choose a reason for hiding this comment

codereport Dec 22, 2021

Choose a reason for hiding this comment

sperlingxx Dec 23, 2021

Choose a reason for hiding this comment

sperlingxx commented Dec 30, 2021

sperlingxx commented Jan 10, 2022

sperlingxx commented Dec 1, 2021 •

edited

Loading

codecov bot commented Dec 1, 2021 •

edited

Loading

sperlingxx commented Dec 2, 2021 •

edited

Loading

sperlingxx commented Dec 2, 2021 •

edited

Loading