Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Decimal downcast truncates, never rounds #8475

Closed
mtjrider opened this issue Jun 9, 2021 · 5 comments
Closed

[BUG] Decimal downcast truncates, never rounds #8475

mtjrider opened this issue Jun 9, 2021 · 5 comments
Assignees
Labels
bug Something isn't working Python Affects Python cuDF API.

Comments

@mtjrider
Copy link
Contributor

mtjrider commented Jun 9, 2021

Describe the bug
A clear and concise description of what the bug is.

When a decimal type in one precision is downcast to another precision, data is truncated instead of rounded.

Steps/Code to reproduce bug
Follow this guide http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports to craft a minimal bug report. This helps us reproduce the issue you're having and resolve the issue more quickly.

Below is code that reproduces the behavior.

import cudf
from cudf.core.dtypes import Decimal64Dtype

data = [
    "0100000000110111000111101011100001010001111010111000010100011110"
]

def binary_to_double(string_value):
    import struct
    bytes_ull = struct.pack("Q", int("0b" + string_value, 0))
    return struct.unpack("d", bytes_ull)[0]

df = cudf.DataFrame(
    {
        "a": [binary_to_double(val) for val in data],
    },
    dtype={
        "a": Decimal64Dtype(18, 11)
    }
)

print(df)

df.a.astype(Decimal64Dtype(9, 2))
                a
0  23.11999999999
0    23.11
Name: a, dtype: decimal

Expected behavior
A clear and concise description of what you expected to happen.

I expect the data to be rounded based on the next significant digit (e.g.)

23.11999... >> 23.12
23.11213... >> 23.11

Environment overview (please complete the following information)

  • Environment location: [Bare-metal, Docker, Cloud(specify cloud provider)]
  • Method of cuDF install: [conda, Docker, or from source]
    • If method of install is [Docker], provide docker pull & docker run commands used

Bare-metal, conda; nightly.

@mtjrider mtjrider added bug Something isn't working Needs Triage Need team to review and classify labels Jun 9, 2021
@randerzander randerzander added depends on libcudf and removed Needs Triage Need team to review and classify labels Jun 9, 2021
@randerzander
Copy link
Contributor

It's unclear to me whether rounding or truncation is the "true" desired behavior. It might be nice to make it configurable via argument to the cast call.

@codereport
Copy link
Contributor

codereport commented Jun 9, 2021

cudf::cast in libcudf was designed to truncate.

TYPED_TEST(FixedPointTests, FixedPointCast)
{
  using fp_wrapper = cudf::test::fixed_point_column_wrapper<int64_t>;

  auto const input    = fp_wrapper{{2311999, 2311213}, numeric::scale_type{-5}};
  auto const expected = fp_wrapper{{2311, 2311}, numeric::scale_type{-2}};
  auto const result   = cudf::cast(input, make_fixed_point_data_type<numeric::decimal64>(-2));

  CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected, result->view());
}

The behaviour that is expected from above can be attained from the cudf::round API.

TYPED_TEST(FixedPointTests, FixedPointRound)
{
  using fp_wrapper = cudf::test::fixed_point_column_wrapper<int64_t>;

  auto const input    = fp_wrapper{{2311999, 2311213}, numeric::scale_type{-5}};
  auto const expected = fp_wrapper{{2312, 2311}, numeric::scale_type{-2}};
  auto const result   = cudf::round(input, 2);

  CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected, result->view());
}

Note that truncation is consistent with what CNL does: https://godbolt.org/z/v8qWEYenY

@mtjrider mtjrider closed this as completed Jun 9, 2021
@mtjrider mtjrider reopened this Jun 9, 2021
@shwina
Copy link
Contributor

shwina commented Jun 9, 2021

On the Python side, this could be exposed via the existing Series.round() method:

>>> df.a
                a
0  23.11999999999

>>> df.a.round(2)  # errors today, but should return:
                a
0  23.12

@shwina
Copy link
Contributor

shwina commented Jun 9, 2021

Correction: after #8278, round() should work for decimals.

@harrism harrism added the Python Affects Python cuDF API. label Jun 15, 2021
@beckernick
Copy link
Member

Given the rounding behavior is available in round, an informative warning about truncation is displayed [1] when downcasting decimals in Python, and the behavior is consistent with other tools, I'm going to close this issue. @mtjrider please feel free to re-open the discussion if this is not closed for your use case.

if (
isinstance(dtype, Decimal64Dtype)
and dtype.scale < self.dtype.scale
):
warn(
"cuDF truncates when downcasting decimals to a lower scale. "
"To round, use Series.round() or DataFrame.round()."
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Python Affects Python cuDF API.
Projects
None yet
Development

No branches or pull requests

7 participants