[BUG] Decimal downcast truncates, never rounds #8475

mtjrider · 2021-06-09T17:07:27Z

Describe the bug
A clear and concise description of what the bug is.

When a decimal type in one precision is downcast to another precision, data is truncated instead of rounded.

Steps/Code to reproduce bug
Follow this guide http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports to craft a minimal bug report. This helps us reproduce the issue you're having and resolve the issue more quickly.

Below is code that reproduces the behavior.

import cudf
from cudf.core.dtypes import Decimal64Dtype

data = [
    "0100000000110111000111101011100001010001111010111000010100011110"
]

def binary_to_double(string_value):
    import struct
    bytes_ull = struct.pack("Q", int("0b" + string_value, 0))
    return struct.unpack("d", bytes_ull)[0]

df = cudf.DataFrame(
    {
        "a": [binary_to_double(val) for val in data],
    },
    dtype={
        "a": Decimal64Dtype(18, 11)
    }
)

print(df)

df.a.astype(Decimal64Dtype(9, 2))

                a
0  23.11999999999
0    23.11
Name: a, dtype: decimal

Expected behavior
A clear and concise description of what you expected to happen.

I expect the data to be rounded based on the next significant digit (e.g.)

23.11999... >> 23.12
23.11213... >> 23.11

Environment overview (please complete the following information)

Environment location: [Bare-metal, Docker, Cloud(specify cloud provider)]

Method of cuDF install: [conda, Docker, or from source]

If method of install is [Docker], provide docker pull & docker run commands used

Bare-metal, conda; nightly.

The text was updated successfully, but these errors were encountered:

randerzander · 2021-06-09T17:21:08Z

It's unclear to me whether rounding or truncation is the "true" desired behavior. It might be nice to make it configurable via argument to the cast call.

codereport · 2021-06-09T17:26:49Z

cudf::cast in libcudf was designed to truncate.

TYPED_TEST(FixedPointTests, FixedPointCast)
{
  using fp_wrapper = cudf::test::fixed_point_column_wrapper<int64_t>;

  auto const input    = fp_wrapper{{2311999, 2311213}, numeric::scale_type{-5}};
  auto const expected = fp_wrapper{{2311, 2311}, numeric::scale_type{-2}};
  auto const result   = cudf::cast(input, make_fixed_point_data_type<numeric::decimal64>(-2));

  CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected, result->view());
}

The behaviour that is expected from above can be attained from the cudf::round API.

TYPED_TEST(FixedPointTests, FixedPointRound)
{
  using fp_wrapper = cudf::test::fixed_point_column_wrapper<int64_t>;

  auto const input    = fp_wrapper{{2311999, 2311213}, numeric::scale_type{-5}};
  auto const expected = fp_wrapper{{2312, 2311}, numeric::scale_type{-2}};
  auto const result   = cudf::round(input, 2);

  CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected, result->view());
}

Note that truncation is consistent with what CNL does: https://godbolt.org/z/v8qWEYenY

shwina · 2021-06-09T18:50:47Z

On the Python side, this could be exposed via the existing Series.round() method:

>>> df.a
                a
0  23.11999999999

>>> df.a.round(2)  # errors today, but should return:
                a
0  23.12

shwina · 2021-06-09T19:14:29Z

Correction: after #8278, round() should work for decimals.

beckernick · 2021-06-28T13:54:53Z

Given the rounding behavior is available in round, an informative warning about truncation is displayed [1] when downcasting decimals in Python, and the behavior is consistent with other tools, I'm going to close this issue. @mtjrider please feel free to re-open the discussion if this is not closed for your use case.

cudf/python/cudf/cudf/core/column/decimal.py

Lines 150 to 157 in 90fe10b

    
           if ( 
        
               isinstance(dtype, Decimal64Dtype) 
        
               and dtype.scale < self.dtype.scale 
        
           ): 
        
               warn( 
        
                   "cuDF truncates when downcasting decimals to a lower scale. " 
        
                   "To round, use Series.round() or DataFrame.round()." 
        
               )

mtjrider added bug Something isn't working Needs Triage Need team to review and classify labels Jun 9, 2021

randerzander assigned jrhemstad and codereport Jun 9, 2021

randerzander added depends on libcudf and removed Needs Triage Need team to review and classify labels Jun 9, 2021

mtjrider closed this as completed Jun 9, 2021

mtjrider reopened this Jun 9, 2021

harrism added the Python Affects Python cuDF API. label Jun 15, 2021

beckernick closed this as completed Jun 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Decimal downcast truncates, never rounds #8475

[BUG] Decimal downcast truncates, never rounds #8475

mtjrider commented Jun 9, 2021 •

edited

Loading

randerzander commented Jun 9, 2021

codereport commented Jun 9, 2021 •

edited

Loading

shwina commented Jun 9, 2021

shwina commented Jun 9, 2021

beckernick commented Jun 28, 2021

[BUG] Decimal downcast truncates, never rounds #8475

[BUG] Decimal downcast truncates, never rounds #8475

Comments

mtjrider commented Jun 9, 2021 • edited Loading

randerzander commented Jun 9, 2021

codereport commented Jun 9, 2021 • edited Loading

shwina commented Jun 9, 2021

shwina commented Jun 9, 2021

beckernick commented Jun 28, 2021

mtjrider commented Jun 9, 2021 •

edited

Loading

codereport commented Jun 9, 2021 •

edited

Loading