Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Avoid decimal type narrowing for decimal binops #10299

Merged
merged 11 commits into from
Feb 23, 2022
49 changes: 41 additions & 8 deletions python/cudf/cudf/core/column/decimal.py
Original file line number Diff line number Diff line change
Expand Up @@ -364,18 +364,51 @@ def _get_decimal_type(lhs_dtype, rhs_dtype, op):
else:
Copy link
Contributor

@bdice bdice Feb 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a comment above,

This should at some point be hooked up to libcudf's binary_operation_fixed_point_scale

Do we only support add/sub/mul/div operations right now in Python because of limitations in this function? I know that other operations are implemented in libcudf, so piping that through might be a significant improvement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we only support add/sub/mul/div operations right now in Python because of limitations in this function?

Not just binary_operation_fixed_point_scale but I think support for other binop's are not supported from libcudf side.

Looking into binary_operation_fixed_point_scale, it seems the formula for DIV is wrong? I could be wrong here but don't match what is specified here: https://docs.microsoft.com/en-us/sql/t-sql/data-types/precision-scale-and-length-transact-sql

Though libcudf doesn't take precision as input the python side will need calculation so probably better to have those two computations in a single place rather than having to have to look at two places.

Copy link
Contributor

@bdice bdice Feb 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Support for other operators exists, e.g. MOD / PMOD / PYMOD: #10179.

I'm fine with keeping both precision/scale calculations together here. I just wanted to make a note to ask, since I saw the comment above.

There may or may not be issues with the scale/precision calculations. I think the page you referenced has different conventions than libcudf. In my understanding:

  • libcudf's scale represents powers of the radix (base 10 or base 2)
  • libcudf's precision (32, 64, 128) represents bits (powers of two) used to store the integral part

Neither value appears to correspond to the linked SQL docs. That page appears to always use powers of 10 for both scale and precision. Also the definition of scale is the negative of libcudf's definition. It does not surprise me that these different conventions would result in different expressions. I spent an hour looking into this but I have no idea how to make the two definitions mathematically correspond.

Working through an example calculation here, for the SQL docs:

e1 = 4.096
p1 = 4
s1 = 3
e2 = 3.2
p2 = 2
s2 = 1
s = max(6, s1 + p2 + 1)
p = p1 - s1 + s2 + s
print(f"{e1/e2=}")  # e1/e2=1.28
print(f"{p=}, {s=}")  # p=8, s=6

I was confused and gave up at this point -- how could 1.28 have p=8, s=6?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bdice, I think @codereport would have a better understanding on this than me. But I'm merging these changes for now and we can have a follow-up PR if changes need to be done.

raise NotImplementedError()

if isinstance(lhs_dtype, type(rhs_dtype)):
# SCENARIO 1: If `lhs_dtype` & `rhs_dtype` are same, then try to
# see if `precision` & `scale` can be fit into this type.
try:
return lhs_dtype.__class__(precision=precision, scale=scale)
except ValueError:
# Call to _validate fails, which means we need
# to goto SCENARIO 3.
pass
else:
# SCENARIO 2: If `lhs_dtype` & `rhs_dtype` are of different dtypes,
# then try to see if `precision` & `scale` can be fit into the type
# with greater MAX_PRECISION (i.e., the bigger dtype).
try:
if lhs_dtype.MAX_PRECISION > rhs_dtype.MAX_PRECISION:
galipremsagar marked this conversation as resolved.
Show resolved Hide resolved
return lhs_dtype.__class__(precision=precision, scale=scale)
else:
return rhs_dtype.__class__(precision=precision, scale=scale)
except ValueError:
# Call to _validate fails, which means we need
# to goto SCENARIO 3.
pass

# SCENARIO 3: If either of the above two scenarios fail, then get the
# MAX_PRECISION of `lhs_dtype` & `rhs_dtype` so that we can only check
# and return a dtype that is greater than or equal to input dtype that
# can fit `precision` & `scale`.
lhs_rhs_max_precision = max(
galipremsagar marked this conversation as resolved.
Show resolved Hide resolved
lhs_dtype.MAX_PRECISION, rhs_dtype.MAX_PRECISION
)
for decimal_type in (
cudf.Decimal32Dtype,
cudf.Decimal64Dtype,
cudf.Decimal128Dtype,
):
try:
min_decimal_type = decimal_type(precision=precision, scale=scale)
except ValueError:
# Call to _validate fails, which means we need
# to try the next dtype
pass
else:
return min_decimal_type
if decimal_type.MAX_PRECISION >= lhs_rhs_max_precision:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this if is checking the same thing as the _validate method in the decimal dtype constructor. Is this unnecessarily duplicated? I'd fall back on the try and remove the if if possible.

Copy link
Contributor

@bdice bdice Feb 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be wrong here -- I see you're constructing the returned type with precision=precision instead of precision=max_precision. Would it be better to try and construct a type with max_precision and return a type with precision if that succeeds? (Or is that a bug -- should it be returning a type with max_precision?)

Copy link
Contributor Author

@galipremsagar galipremsagar Feb 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be wrong here -- I see you're constructing the returned type with precision=precision instead of precision=max_precision. Would it be better to try and construct a type with max_precision and return a type with precision if that succeeds? (Or is that a bug -- should it be returning a type with max_precision?)

It's not a bug, the dtype is expected to have precision and not max_precision

It looks like this if is checking the same thing as the _validate method in the decimal dtype constructor. Is this unnecessarily duplicated? I'd fall back on the try and remove the if if possible.

This was a necessary duplication because we want to pick a dtype that is not less than lhs_dtype or rhs_dtype. i.e., avoid type narrowing.

try:
min_decimal_type = decimal_type(
precision=precision, scale=scale
)
galipremsagar marked this conversation as resolved.
Show resolved Hide resolved
except ValueError:
# Call to _validate fails, which means we need
# to try the next dtype
pass
galipremsagar marked this conversation as resolved.
Show resolved Hide resolved
else:
return min_decimal_type

raise OverflowError("Maximum supported decimal type is Decimal128")
24 changes: 12 additions & 12 deletions python/cudf/cudf/tests/test_binops.py
Original file line number Diff line number Diff line change
Expand Up @@ -1800,7 +1800,7 @@ def test_binops_with_NA_consistent(dtype, op):
["1.5", "2.0"],
cudf.Decimal64Dtype(scale=2, precision=3),
["3.0", "4.0"],
cudf.Decimal32Dtype(scale=2, precision=4),
cudf.Decimal64Dtype(scale=2, precision=4),
),
(
operator.add,
Expand All @@ -1809,7 +1809,7 @@ def test_binops_with_NA_consistent(dtype, op):
["2.25", "1.005"],
cudf.Decimal64Dtype(scale=3, precision=4),
["3.75", "3.005"],
cudf.Decimal32Dtype(scale=3, precision=5),
cudf.Decimal64Dtype(scale=3, precision=5),
),
(
operator.add,
Expand All @@ -1827,7 +1827,7 @@ def test_binops_with_NA_consistent(dtype, op):
["2.25", "1.005"],
cudf.Decimal64Dtype(scale=3, precision=4),
["-0.75", "0.995"],
cudf.Decimal32Dtype(scale=3, precision=5),
cudf.Decimal64Dtype(scale=3, precision=5),
),
(
operator.sub,
Expand All @@ -1836,7 +1836,7 @@ def test_binops_with_NA_consistent(dtype, op):
["2.25", "1.005"],
cudf.Decimal64Dtype(scale=3, precision=4),
["-0.75", "0.995"],
cudf.Decimal32Dtype(scale=3, precision=5),
cudf.Decimal64Dtype(scale=3, precision=5),
),
(
operator.sub,
Expand All @@ -1854,7 +1854,7 @@ def test_binops_with_NA_consistent(dtype, op):
["1.5", "3.0"],
cudf.Decimal64Dtype(scale=3, precision=4),
["2.25", "6.0"],
cudf.Decimal32Dtype(scale=5, precision=8),
cudf.Decimal64Dtype(scale=5, precision=8),
),
(
operator.mul,
Expand All @@ -1863,7 +1863,7 @@ def test_binops_with_NA_consistent(dtype, op):
["0.1", "0.2"],
cudf.Decimal64Dtype(scale=3, precision=4),
["10.0", "40.0"],
cudf.Decimal32Dtype(scale=1, precision=8),
cudf.Decimal64Dtype(scale=1, precision=8),
),
(
operator.mul,
Expand All @@ -1872,7 +1872,7 @@ def test_binops_with_NA_consistent(dtype, op):
["0.343", "0.500"],
cudf.Decimal64Dtype(scale=3, precision=3),
["343.0", "1000.0"],
cudf.Decimal32Dtype(scale=0, precision=8),
cudf.Decimal64Dtype(scale=0, precision=8),
),
(
operator.truediv,
Expand Down Expand Up @@ -1908,7 +1908,7 @@ def test_binops_with_NA_consistent(dtype, op):
["1.5", None, "2.0"],
cudf.Decimal64Dtype(scale=1, precision=2),
["3.0", None, "4.0"],
cudf.Decimal32Dtype(scale=1, precision=3),
cudf.Decimal64Dtype(scale=1, precision=3),
),
(
operator.add,
Expand All @@ -1917,7 +1917,7 @@ def test_binops_with_NA_consistent(dtype, op):
["2.25", "1.005"],
cudf.Decimal64Dtype(scale=3, precision=4),
["3.75", None],
cudf.Decimal32Dtype(scale=3, precision=5),
cudf.Decimal64Dtype(scale=3, precision=5),
),
(
operator.sub,
Expand All @@ -1926,7 +1926,7 @@ def test_binops_with_NA_consistent(dtype, op):
["2.25", None],
cudf.Decimal64Dtype(scale=3, precision=4),
["-0.75", None],
cudf.Decimal32Dtype(scale=3, precision=5),
cudf.Decimal64Dtype(scale=3, precision=5),
),
(
operator.sub,
Expand All @@ -1935,7 +1935,7 @@ def test_binops_with_NA_consistent(dtype, op):
["2.25", None],
cudf.Decimal64Dtype(scale=3, precision=4),
["-0.75", None],
cudf.Decimal32Dtype(scale=3, precision=5),
cudf.Decimal64Dtype(scale=3, precision=5),
),
(
operator.mul,
Expand All @@ -1944,7 +1944,7 @@ def test_binops_with_NA_consistent(dtype, op):
["1.5", None],
cudf.Decimal64Dtype(scale=3, precision=4),
["2.25", None],
cudf.Decimal32Dtype(scale=5, precision=8),
cudf.Decimal64Dtype(scale=5, precision=8),
),
(
operator.mul,
Expand Down