-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Avoid decimal
type narrowing for decimal binops
#10299
Merged
Merged
Changes from 10 commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
7b084c1
change type handling
galipremsagar 1f5ca2b
change comment
galipremsagar 6d2cbab
Update decimal.py
galipremsagar 7f6ec3e
Merge branch 'rapidsai:branch-22.04' into 10282
galipremsagar d43e944
Merge branch 'rapidsai:branch-22.04' into 10282
galipremsagar 840630b
Merge branch 'rapidsai:branch-22.04' into 10282
galipremsagar a2c6e8a
Update python/cudf/cudf/core/column/decimal.py
galipremsagar bd54b5e
Update python/cudf/cudf/core/column/decimal.py
galipremsagar c12f9f7
Update python/cudf/cudf/core/column/decimal.py
galipremsagar 71572ec
cleanup
galipremsagar 56987a4
Apply suggestions from code review
galipremsagar File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -364,18 +364,45 @@ def _get_decimal_type(lhs_dtype, rhs_dtype, op): | |
else: | ||
raise NotImplementedError() | ||
|
||
if isinstance(lhs_dtype, type(rhs_dtype)): | ||
# SCENARIO 1: If `lhs_dtype` & `rhs_dtype` are same, then try to | ||
# see if `precision` & `scale` can be fit into this type. | ||
try: | ||
return lhs_dtype.__class__(precision=precision, scale=scale) | ||
except ValueError: | ||
# Call to _validate fails, which means we need | ||
# to goto SCENARIO 3. | ||
pass | ||
else: | ||
# SCENARIO 2: If `lhs_dtype` & `rhs_dtype` are of different dtypes, | ||
# then try to see if `precision` & `scale` can be fit into the type | ||
# with greater MAX_PRECISION (i.e., the bigger dtype). | ||
try: | ||
if lhs_dtype.MAX_PRECISION >= rhs_dtype.MAX_PRECISION: | ||
return lhs_dtype.__class__(precision=precision, scale=scale) | ||
else: | ||
return rhs_dtype.__class__(precision=precision, scale=scale) | ||
except ValueError: | ||
# Call to _validate fails, which means we need | ||
# to goto SCENARIO 3. | ||
pass | ||
galipremsagar marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
# SCENARIO 3: If either of the above two scenarios fail, then get the | ||
# MAX_PRECISION of `lhs_dtype` & `rhs_dtype` so that we can only check | ||
# and return a dtype that is greater than or equal to input dtype that | ||
# can fit `precision` & `scale`. | ||
max_precision = max(lhs_dtype.MAX_PRECISION, rhs_dtype.MAX_PRECISION) | ||
for decimal_type in ( | ||
cudf.Decimal32Dtype, | ||
cudf.Decimal64Dtype, | ||
cudf.Decimal128Dtype, | ||
): | ||
try: | ||
min_decimal_type = decimal_type(precision=precision, scale=scale) | ||
except ValueError: | ||
# Call to _validate fails, which means we need | ||
# to try the next dtype | ||
pass | ||
else: | ||
return min_decimal_type | ||
if decimal_type.MAX_PRECISION >= max_precision: | ||
try: | ||
return decimal_type(precision=precision, scale=scale) | ||
except ValueError: | ||
# Call to _validate fails, which means we need | ||
# to try the next dtype | ||
pass | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For a problem larger than this, I would suggest something like
galipremsagar marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
raise OverflowError("Maximum supported decimal type is Decimal128") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see a comment above,
Do we only support add/sub/mul/div operations right now in Python because of limitations in this function? I know that other operations are implemented in libcudf, so piping that through might be a significant improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not just
binary_operation_fixed_point_scale
but I think support for other binop's are not supported from libcudf side.Looking into
binary_operation_fixed_point_scale
, it seems the formula forDIV
is wrong? I could be wrong here but don't match what is specified here: https://docs.microsoft.com/en-us/sql/t-sql/data-types/precision-scale-and-length-transact-sqlThough libcudf doesn't take
precision
as input the python side will need calculation so probably better to have those two computations in a single place rather than having to have to look at two places.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Support for other operators exists, e.g.
MOD
/PMOD
/PYMOD
: #10179.I'm fine with keeping both precision/scale calculations together here. I just wanted to make a note to ask, since I saw the comment above.
There may or may not be issues with the scale/precision calculations. I think the page you referenced has different conventions than libcudf. In my understanding:
Neither value appears to correspond to the linked SQL docs. That page appears to always use powers of 10 for both scale and precision. Also the definition of scale is the negative of libcudf's definition. It does not surprise me that these different conventions would result in different expressions. I spent an hour looking into this but I have no idea how to make the two definitions mathematically correspond.
Working through an example calculation here, for the SQL docs:
I was confused and gave up at this point -- how could
1.28
havep=8, s=6
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @bdice, I think @codereport would have a better understanding on this than me. But I'm merging these changes for now and we can have a follow-up PR if changes need to be done.