Fix the overflow of container type when casting floats to decimal #5766

sperlingxx · 2022-06-07T09:05:36Z

This PR is to fix the potential overflow when casting float/double to decimal. The overflow occurs on the container decimal for HALF_UP round. The precision of the container decimal should increase along with the scale in case of cornor cases like #5765.

Signed-off-by: sperlingxx [email protected]

sperlingxx · 2022-06-07T10:45:04Z

build

ttnghia · 2022-06-07T12:55:54Z

I assume that this should go to 22.06 release.

andygrove

The change looks reasonable to me but I have not worked on the decimal implementation and am not an expert in this area.

jlowe · 2022-06-07T18:23:41Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuCast.scala

        withResource(checked.castTo(containerType)) { container =>
-          container.round(dt.scale, cudf.RoundMode.HALF_UP)
+          withResource(container.round(dt.scale, cudf.RoundMode.HALF_UP)) { rd =>
+            rd.castTo(targetType)


Nit; Doesn't this cast end up creating a redundant copy of the column in many (all?) cases? Wondering if we should check if the resulting type from round is already the desired cudf type. I would expect cudf's round to deterministically produce a result type based on the input cast type which makes me wonder why the cast is necessary here.

@jlowe The cast here is for cases that underlying cuDF decimal type got promoted as precision + 1 (from Decimal32 to Decimal64). For case like that, I think we need to convert back to the original cuDF decimal type, to keep align with the precision.

We are pre-casting to the promoted precision but asking cudf to round to the originally desired precision. Do we know for certain that cudf does not already put the result in the appropriate type as part of the round implementation? Even if it does not, it is not always the case that the the promotion cast actually did anything (e.g.: it was DECIMAL32 before and after, because precision+1 didn't actually need DECIMAL64).

This isn't must-fix for proper functionality, more of a performance optimization for what may be a common case.

Do we know for certain that cudf does not already put the result in the appropriate type as part of the round implementation?

I'm not sure if I understand your question correctly. In cudf, floating-point is just shifted (mul/div) by the scale value then rounded (towards zero) to integer for internal storage.
https://github.com/rapidsai/cudf/blob/fae22213537a003c74e2da9feac1bda38f562f6f/cpp/include/cudf/fixed_point/fixed_point.hpp#L230

Maybe cudf can just fix the accuracy problem easily by just simply not rounding toward zero! That simple fix will not cause such issue on the plugin side but I'm not sure if that is the desired behavior in cudf. I'll file an issue.

Filed: rapidsai/cudf#11079

Signed-off-by: sperlingxx <[email protected]>

sperlingxx · 2022-06-08T03:27:40Z

Conducted a force push as rebasing to 22.06

sperlingxx · 2022-06-08T03:30:17Z

build

jlowe

I'm OK with it as-is, but it would be good to have a followup for what appears to be an unnecessary post-round cast in some (all?) cases.

ttnghia · 2022-06-08T20:03:42Z

Regarding to the reason behind this bug, I repost an explain from my cudf issue (rapidsai/cudf#11079 (comment)):

(reposting this message from slack)

There is a long history behind this and truncation is definitely the desired and intentional behaviour.

Here are the main motivations for truncation:

Consistency with CNL. CNL is being proposed for the standard library and consistency with standard types is a goal of libcudf. Note that (IIRC) every other C++ fixed-point library has made the same choice.
Relying on the behaviour of integer types in C++ greatly simplifies certain binary operations like divison for fixed-point
We can (and have) provided rounding functionality through the cudf::round API so on the Spark side of things you are able to get whatever behaviour you want

Note that at one point, we actually did have rounding functionality baked into floating point construction for fixed-point. However, it wasn't actually to address the desire to be more "accurate" as @ttnghia is pointing out, it was to try and address inherent issues in floating point (such as 1.001 not being representable by floating point, meaning if you construct a fixed-point with scale -3, you end up with 1 , missing the .001). @harrism and I had many discussions about this and both agreed that "ideally" fixed-point should be able to avoid this at the end of the day. However, it ended up presenting too many issues / complications and not actually comprehensively fixing all cases. And note, trying to be "better" was not consistent with what CNL and other C++ fixed-point libraries do. So we decided at a certain point to just accept the inherent flaws of floating point and follow CNL. This was done in the following small PR: rapidsai/cudf#6544 If you take a look at the unit tests, it is very clear how the behaviour changed by looking at the before and after. Furthermore, there was a bunch of "opposition research" done at one point in to what other C++ fixed-point libraries do. Our goal is not to be similar to Julia or Go or Python, but to be a generic C++ backend that provides all the tools to do whatever you want in the front end language (like cuDF or Spark-RAPIDS).

* Correct the error message for test_mod_pmod_by_zero (#5781) Signed-off-by: Firestarman <[email protected]> * Update the error string for test_cast_neg_to_decimal_err on 330[databricks] (#5784) * Update the error string for test_cast_neg_to_decimal_err on 330 Signed-off-by: Firestarman <[email protected]> * address comments Signed-off-by: Firestarman <[email protected]> * Throw an exception when attempting to read columnar encrypted Parquet files on the GPU [databricks] (#5761) * Throw useful message when parquet columnar encryption enabled * update message * fix message * handle native encrypted * move variable * cleanup * fix native check * cleanup imports * fix import order * Sign off Signed-off-by: Thomas Graves <[email protected]> * Shim the parquet crypto exception check Signed-off-by: Thomas Graves <[email protected]> * shim 320cdh * Add test for parquet encryption Signed-off-by: Thomas Graves <[email protected]> * fix rounds over decimal in Spark 330+ (#5786) Passes the datatype of round-like functions directly to GPU overrides, so as to adapt different Spark versions. Signed-off-by: sperlingxx <[email protected]> * Fix the overflow of container type when casting floats to decimal (#5766) Fixes #5765 Fix the potential overflow when casting float/double to decimal. The overflow occurs on the container decimal for HALF_UP round. Signed-off-by: sperlingxx <[email protected]> Co-authored-by: Liangcai Li <[email protected]> Co-authored-by: Alfred Xu <[email protected]>

andygrove previously approved these changes Jun 7, 2022

View reviewed changes

jlowe reviewed Jun 7, 2022

View reviewed changes

sameerz added the bug Something isn't working label Jun 7, 2022

sameerz assigned sperlingxx Jun 7, 2022

Fix the overflow of container type when casting floats to decimal

e3675f8

Signed-off-by: sperlingxx <[email protected]>

sperlingxx dismissed andygrove’s stale review via e3675f8 June 8, 2022 03:23

sperlingxx force-pushed the fix_cast_fp_to_dec branch from 729e04e to e3675f8 Compare June 8, 2022 03:23

sperlingxx requested review from revans2, tgravescs, GaryShen2008, NvTimLiu and zhanga5 as code owners June 8, 2022 03:23

sperlingxx changed the base branch from branch-22.08 to branch-22.06 June 8, 2022 03:23

sperlingxx removed request for revans2, tgravescs, zhanga5, GaryShen2008 and NvTimLiu June 8, 2022 03:24

update

2a0ad21

sperlingxx requested a review from jlowe June 8, 2022 03:26

jlowe approved these changes Jun 8, 2022

View reviewed changes

sperlingxx merged commit 59d05d2 into NVIDIA:branch-22.06 Jun 8, 2022

sperlingxx deleted the fix_cast_fp_to_dec branch June 8, 2022 15:58

jlowe mentioned this pull request Jun 8, 2022

Fix auto merge conflict 5789 #5790

Merged

sperlingxx mentioned this pull request Jun 13, 2022

[FEA] Replace post-round cast over decimal round #5803

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix the overflow of container type when casting floats to decimal #5766

Fix the overflow of container type when casting floats to decimal #5766

sperlingxx commented Jun 7, 2022 •

edited

Loading

sperlingxx commented Jun 7, 2022

ttnghia commented Jun 7, 2022

andygrove left a comment

jlowe Jun 7, 2022

sperlingxx Jun 8, 2022

jlowe Jun 8, 2022

ttnghia Jun 8, 2022 •

edited

Loading

ttnghia Jun 8, 2022 •

edited

Loading

ttnghia Jun 8, 2022

sperlingxx commented Jun 8, 2022

sperlingxx commented Jun 8, 2022

jlowe left a comment

ttnghia commented Jun 8, 2022 •

edited

Loading

Fix the overflow of container type when casting floats to decimal #5766

Fix the overflow of container type when casting floats to decimal #5766

Conversation

sperlingxx commented Jun 7, 2022 • edited Loading

sperlingxx commented Jun 7, 2022

ttnghia commented Jun 7, 2022

andygrove left a comment

Choose a reason for hiding this comment

jlowe Jun 7, 2022

Choose a reason for hiding this comment

sperlingxx Jun 8, 2022

Choose a reason for hiding this comment

jlowe Jun 8, 2022

Choose a reason for hiding this comment

ttnghia Jun 8, 2022 • edited Loading

Choose a reason for hiding this comment

ttnghia Jun 8, 2022 • edited Loading

Choose a reason for hiding this comment

ttnghia Jun 8, 2022

Choose a reason for hiding this comment

sperlingxx commented Jun 8, 2022

sperlingxx commented Jun 8, 2022

jlowe left a comment

Choose a reason for hiding this comment

ttnghia commented Jun 8, 2022 • edited Loading

sperlingxx commented Jun 7, 2022 •

edited

Loading

ttnghia Jun 8, 2022 •

edited

Loading

ttnghia Jun 8, 2022 •

edited

Loading

ttnghia commented Jun 8, 2022 •

edited

Loading