Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Avoid decimal type narrowing for decimal binops #10299

Merged
merged 11 commits into from
Feb 23, 2022
43 changes: 35 additions & 8 deletions python/cudf/cudf/core/column/decimal.py
Original file line number Diff line number Diff line change
Expand Up @@ -364,18 +364,45 @@ def _get_decimal_type(lhs_dtype, rhs_dtype, op):
else:
Copy link
Contributor

@bdice bdice Feb 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a comment above,

This should at some point be hooked up to libcudf's binary_operation_fixed_point_scale

Do we only support add/sub/mul/div operations right now in Python because of limitations in this function? I know that other operations are implemented in libcudf, so piping that through might be a significant improvement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we only support add/sub/mul/div operations right now in Python because of limitations in this function?

Not just binary_operation_fixed_point_scale but I think support for other binop's are not supported from libcudf side.

Looking into binary_operation_fixed_point_scale, it seems the formula for DIV is wrong? I could be wrong here but don't match what is specified here: https://docs.microsoft.com/en-us/sql/t-sql/data-types/precision-scale-and-length-transact-sql

Though libcudf doesn't take precision as input the python side will need calculation so probably better to have those two computations in a single place rather than having to have to look at two places.

Copy link
Contributor

@bdice bdice Feb 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Support for other operators exists, e.g. MOD / PMOD / PYMOD: #10179.

I'm fine with keeping both precision/scale calculations together here. I just wanted to make a note to ask, since I saw the comment above.

There may or may not be issues with the scale/precision calculations. I think the page you referenced has different conventions than libcudf. In my understanding:

  • libcudf's scale represents powers of the radix (base 10 or base 2)
  • libcudf's precision (32, 64, 128) represents bits (powers of two) used to store the integral part

Neither value appears to correspond to the linked SQL docs. That page appears to always use powers of 10 for both scale and precision. Also the definition of scale is the negative of libcudf's definition. It does not surprise me that these different conventions would result in different expressions. I spent an hour looking into this but I have no idea how to make the two definitions mathematically correspond.

Working through an example calculation here, for the SQL docs:

e1 = 4.096
p1 = 4
s1 = 3
e2 = 3.2
p2 = 2
s2 = 1
s = max(6, s1 + p2 + 1)
p = p1 - s1 + s2 + s
print(f"{e1/e2=}")  # e1/e2=1.28
print(f"{p=}, {s=}")  # p=8, s=6

I was confused and gave up at this point -- how could 1.28 have p=8, s=6?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bdice, I think @codereport would have a better understanding on this than me. But I'm merging these changes for now and we can have a follow-up PR if changes need to be done.

raise NotImplementedError()

if isinstance(lhs_dtype, type(rhs_dtype)):
# SCENARIO 1: If `lhs_dtype` & `rhs_dtype` are same, then try to
# see if `precision` & `scale` can be fit into this type.
try:
return lhs_dtype.__class__(precision=precision, scale=scale)
except ValueError:
# Call to _validate fails, which means we need
# to goto SCENARIO 3.
pass
else:
# SCENARIO 2: If `lhs_dtype` & `rhs_dtype` are of different dtypes,
# then try to see if `precision` & `scale` can be fit into the type
# with greater MAX_PRECISION (i.e., the bigger dtype).
try:
if lhs_dtype.MAX_PRECISION >= rhs_dtype.MAX_PRECISION:
return lhs_dtype.__class__(precision=precision, scale=scale)
else:
return rhs_dtype.__class__(precision=precision, scale=scale)
except ValueError:
# Call to _validate fails, which means we need
# to goto SCENARIO 3.
pass
galipremsagar marked this conversation as resolved.
Show resolved Hide resolved

# SCENARIO 3: If either of the above two scenarios fail, then get the
# MAX_PRECISION of `lhs_dtype` & `rhs_dtype` so that we can only check
# and return a dtype that is greater than or equal to input dtype that
# can fit `precision` & `scale`.
max_precision = max(lhs_dtype.MAX_PRECISION, rhs_dtype.MAX_PRECISION)
for decimal_type in (
cudf.Decimal32Dtype,
cudf.Decimal64Dtype,
cudf.Decimal128Dtype,
):
try:
min_decimal_type = decimal_type(precision=precision, scale=scale)
except ValueError:
# Call to _validate fails, which means we need
# to try the next dtype
pass
else:
return min_decimal_type
if decimal_type.MAX_PRECISION >= max_precision:
try:
return decimal_type(precision=precision, scale=scale)
except ValueError:
# Call to _validate fails, which means we need
# to try the next dtype
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a problem larger than this, I would suggest something like bisect to determine the type corresponding to a certain precision, but I think this is fine.

galipremsagar marked this conversation as resolved.
Show resolved Hide resolved

raise OverflowError("Maximum supported decimal type is Decimal128")
24 changes: 12 additions & 12 deletions python/cudf/cudf/tests/test_binops.py
Original file line number Diff line number Diff line change
Expand Up @@ -1800,7 +1800,7 @@ def test_binops_with_NA_consistent(dtype, op):
["1.5", "2.0"],
cudf.Decimal64Dtype(scale=2, precision=3),
["3.0", "4.0"],
cudf.Decimal32Dtype(scale=2, precision=4),
cudf.Decimal64Dtype(scale=2, precision=4),
),
(
operator.add,
Expand All @@ -1809,7 +1809,7 @@ def test_binops_with_NA_consistent(dtype, op):
["2.25", "1.005"],
cudf.Decimal64Dtype(scale=3, precision=4),
["3.75", "3.005"],
cudf.Decimal32Dtype(scale=3, precision=5),
cudf.Decimal64Dtype(scale=3, precision=5),
),
(
operator.add,
Expand All @@ -1827,7 +1827,7 @@ def test_binops_with_NA_consistent(dtype, op):
["2.25", "1.005"],
cudf.Decimal64Dtype(scale=3, precision=4),
["-0.75", "0.995"],
cudf.Decimal32Dtype(scale=3, precision=5),
cudf.Decimal64Dtype(scale=3, precision=5),
),
(
operator.sub,
Expand All @@ -1836,7 +1836,7 @@ def test_binops_with_NA_consistent(dtype, op):
["2.25", "1.005"],
cudf.Decimal64Dtype(scale=3, precision=4),
["-0.75", "0.995"],
cudf.Decimal32Dtype(scale=3, precision=5),
cudf.Decimal64Dtype(scale=3, precision=5),
),
(
operator.sub,
Expand All @@ -1854,7 +1854,7 @@ def test_binops_with_NA_consistent(dtype, op):
["1.5", "3.0"],
cudf.Decimal64Dtype(scale=3, precision=4),
["2.25", "6.0"],
cudf.Decimal32Dtype(scale=5, precision=8),
cudf.Decimal64Dtype(scale=5, precision=8),
),
(
operator.mul,
Expand All @@ -1863,7 +1863,7 @@ def test_binops_with_NA_consistent(dtype, op):
["0.1", "0.2"],
cudf.Decimal64Dtype(scale=3, precision=4),
["10.0", "40.0"],
cudf.Decimal32Dtype(scale=1, precision=8),
cudf.Decimal64Dtype(scale=1, precision=8),
),
(
operator.mul,
Expand All @@ -1872,7 +1872,7 @@ def test_binops_with_NA_consistent(dtype, op):
["0.343", "0.500"],
cudf.Decimal64Dtype(scale=3, precision=3),
["343.0", "1000.0"],
cudf.Decimal32Dtype(scale=0, precision=8),
cudf.Decimal64Dtype(scale=0, precision=8),
),
(
operator.truediv,
Expand Down Expand Up @@ -1908,7 +1908,7 @@ def test_binops_with_NA_consistent(dtype, op):
["1.5", None, "2.0"],
cudf.Decimal64Dtype(scale=1, precision=2),
["3.0", None, "4.0"],
cudf.Decimal32Dtype(scale=1, precision=3),
cudf.Decimal64Dtype(scale=1, precision=3),
),
(
operator.add,
Expand All @@ -1917,7 +1917,7 @@ def test_binops_with_NA_consistent(dtype, op):
["2.25", "1.005"],
cudf.Decimal64Dtype(scale=3, precision=4),
["3.75", None],
cudf.Decimal32Dtype(scale=3, precision=5),
cudf.Decimal64Dtype(scale=3, precision=5),
),
(
operator.sub,
Expand All @@ -1926,7 +1926,7 @@ def test_binops_with_NA_consistent(dtype, op):
["2.25", None],
cudf.Decimal64Dtype(scale=3, precision=4),
["-0.75", None],
cudf.Decimal32Dtype(scale=3, precision=5),
cudf.Decimal64Dtype(scale=3, precision=5),
),
(
operator.sub,
Expand All @@ -1935,7 +1935,7 @@ def test_binops_with_NA_consistent(dtype, op):
["2.25", None],
cudf.Decimal64Dtype(scale=3, precision=4),
["-0.75", None],
cudf.Decimal32Dtype(scale=3, precision=5),
cudf.Decimal64Dtype(scale=3, precision=5),
),
(
operator.mul,
Expand All @@ -1944,7 +1944,7 @@ def test_binops_with_NA_consistent(dtype, op):
["1.5", None],
cudf.Decimal64Dtype(scale=3, precision=4),
["2.25", None],
cudf.Decimal32Dtype(scale=5, precision=8),
cudf.Decimal64Dtype(scale=5, precision=8),
),
(
operator.mul,
Expand Down