[BUG] Decimal128 uses a fixed precision of 18 when converting to an arrow array #13749

jihoonson · 2023-07-25T17:31:23Z

Describe the bug
Hi all. When a decimal128 column is converted to an arrow array, the precision is set to 18 in this line, even though the max precision decimal128 supports is 38. This seems like a bug rather than intentional behavior.

GregoryKimball · 2023-08-18T16:15:25Z

@codereport Would you please let me know if you think this precision setting should change? Why or why not?

codereport · 2023-08-21T14:08:52Z

So I'm not 💯 percent confident this is the story but I am pretty certain is is the case. TLDR: It was a bug when it was introduced in #9986.

Timeline:

[REVIEW] Initial fixed_point Column Support #5704 was merged Jul 2020
Add to/from arrow for fixed point columns #7609 was merged Apr 2021 ⬅️ introduced the 18 for decimal64
Add support for decimal128 #9483 was merged Nov 2021
decimal128 Support for to/from_arrow #9986 was merged Apr 2022 ⬅️ introduced the 18 for decimal6128

My guess is this was just a copy and paste issue. Note though that libcudf doesn't actually have the concept of precision, but Python CUDF does seem to set it to 38 in the case of 128.

class Decimal32Dtype(DecimalDtype):
    name = "decimal32"
    MAX_PRECISION = np.floor(np.log10(np.iinfo("int32").max))
    ITEMSIZE = 4


@doc_apply(
    decimal_dtype_template.format(
        size="64",
    )
)
class Decimal64Dtype(DecimalDtype):
    name = "decimal64"
    MAX_PRECISION = np.floor(np.log10(np.iinfo("int64").max))
    ITEMSIZE = 8


@doc_apply(
    decimal_dtype_template.format(
        size="128",
    )
)
class Decimal128Dtype(DecimalDtype):
    name = "decimal128"
    MAX_PRECISION = 38
    ITEMSIZE = 16

My suggestion is to "fix" it, and then write a few tests to see if it works when roundtripping. IIRC, we had planned to test most of the overflow / close to overflow use cases from the Python side of things cause it was much easier that way. Probably why it wasn't caught immediately in the C++ test cases (as they are quite bare bones)

jihoonson · 2023-09-28T18:01:35Z

Thanks @codereport. I created #14230 to fix this issue.

…ay (#14230) This PR fixes #13749. As discussed in the issue linked, the precision is unnecessarily being limited to 18 when converting decimal128 to arrow. Implementation-wise, I wasn't sure where is the best place to define the max precision for decimal types. Given that the decimal types don't store the precision in libcudf, I thought it would be better to not expose the max precision to the outside of to-arrow conversion. However, this led to replicating the definition of max precision across multiple places. Appreciate any suggestion. Finally, it was suggested in #13749 (comment) to add some tests for round tripping. The existing tests look sufficient to me for round trip tests, so I just modified them instead of adding new tests. Please let me know if we need new tests in addition to them. I'm also not sure whether any documentation should be fixed. Please let me know. Authors: - Jihoon Son (https://github.com/jihoonson) Approvers: - Mike Wilson (https://github.com/hyperbolic2346) - Nghia Truong (https://github.com/ttnghia) URL: #14230

jihoonson added Needs Triage Need team to review and classify bug Something isn't working labels Jul 25, 2023

GregoryKimball added this to the Decimal data type and operations milestone Aug 18, 2023

GregoryKimball added 0 - Backlog In queue waiting for assignment libcudf Affects libcudf (C++/CUDA) code. and removed Needs Triage Need team to review and classify labels Aug 18, 2023

jihoonson changed the title ~~[BUG] Decimal128 uses a fixed precision of 18 when convering to an arrow array~~ [BUG] Decimal128 uses a fixed precision of 18 when converting to an arrow array Sep 22, 2023

jihoonson mentioned this issue Sep 28, 2023

Fix the precision when converting a decimal128 column to an arrow array #14230

Merged

3 tasks

rapids-bot bot closed this as completed in #14230 Oct 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Decimal128 uses a fixed precision of 18 when converting to an arrow array #13749

[BUG] Decimal128 uses a fixed precision of 18 when converting to an arrow array #13749

jihoonson commented Jul 25, 2023 •

edited

Loading

GregoryKimball commented Aug 18, 2023

codereport commented Aug 21, 2023

jihoonson commented Sep 28, 2023

[BUG] Decimal128 uses a fixed precision of 18 when converting to an arrow array #13749

[BUG] Decimal128 uses a fixed precision of 18 when converting to an arrow array #13749

Comments

jihoonson commented Jul 25, 2023 • edited Loading

GregoryKimball commented Aug 18, 2023

codereport commented Aug 21, 2023

jihoonson commented Sep 28, 2023

jihoonson commented Jul 25, 2023 •

edited

Loading