Fix type promotion edge cases in numerical binops #12074

wence- · 2022-11-04T18:03:11Z

Description

The type normalisation applied before heading into libcudf previously had slightly unexpected consequences for large int64 values. If not providing a cudf.Scalar, a bare int64 scalar would be cast to uint64 and then normal numpy type promotion would unify to float64. This is lossy, since int64 to float64 is neither surjective nor injective.

To avoid this, try very hard to maintain the dtype of the object coming in, and match pandas behaviour by applying numpy type promotion rules via numpy.result_type.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

wence- · 2022-11-04T18:07:04Z

This is a breaking change because it changes the behaviour of user-facing API, though I hope very much no-one was relying on it. FWIW, pandas gets this case right.

python/cudf/cudf/core/column/numerical.py

Try and do everything following numpy using types rather than values by first attempting to use the dtype of the passed in operand and subsequently (if it does not have one) using result_type. This way we avoid problems with min_scalar_type wanting to pick unsigned int types for bare Python integers.

Closes rapidsai#12092.

Closes rapidsai#5938.

No idea how to handle the pandas weirdness here.

wence- · 2022-11-08T18:52:39Z

Requesting some careful eyes from reviewers here (assuming at least the test suite all passes). This kind of type promotion/casting and all the edge cases is really hard to think through for me.

python/cudf/cudf/core/column/numerical.py

seberg

Thought I would have a look and added a few comments/questions in the hope they are useful

python/cudf/cudf/core/column/column.py

python/cudf/cudf/core/column/numerical.py

python/cudf/cudf/tests/test_binops.py

codecov · 2022-11-09T14:47:06Z

Codecov Report

Base: 88.07% // Head: 88.11% // Increases project coverage by +0.04% 🎉

Coverage data is based on head (3fdc7a8) compared to base (b2e5069).
Patch coverage: 100.00% of modified lines in pull request are covered.

❗ Current head 3fdc7a8 differs from pull request most recent head 56cd889. Consider uploading reports for the commit 56cd889 to get more accurate results

Additional details and impacted files

@@               Coverage Diff                @@
##           branch-22.12   #12074      +/-   ##
================================================
+ Coverage         88.07%   88.11%   +0.04%     
================================================
  Files               135      135              
  Lines             22133    22124       -9     
================================================
+ Hits              19494    19495       +1     
+ Misses             2639     2629      -10

Impacted Files	Coverage Δ
python/cudf/cudf/core/index.py	`92.88% <ø> (ø)`
python/cudf/cudf/core/series.py	`95.71% <ø> (ø)`
python/cudf/cudf/core/column/column.py	`87.96% <100.00%> (ø)`
python/cudf/cudf/core/column/numerical.py	`96.51% <100.00%> (+1.04%)`	⬆️
python/cudf/cudf/core/column/timedelta.py	`90.17% <100.00%> (ø)`
python/cudf/cudf/core/dataframe.py	`93.64% <0.00%> (+0.04%)`	⬆️
python/cudf/cudf/core/column/string.py	`88.65% <0.00%> (+0.12%)`	⬆️
python/cudf/cudf/core/groupby/groupby.py	`91.51% <0.00%> (+0.20%)`	⬆️
python/cudf/cudf/core/tools/datetimes.py	`84.49% <0.00%> (+0.30%)`	⬆️
... and 1 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Now that binop with cudf.Scalar matches pandas behaviour with numpy-dtype-enabled scalars, we need to manually promote here.

python/cudf/cudf/core/column/numerical.py

python/cudf/cudf/core/series.py

…ue-12072

wence- · 2022-11-15T12:44:10Z

I think this is now ready for another look (though happy to retarget to 23.0x)

python/cudf/cudf/tests/test_timedelta.py

wence- · 2022-11-16T11:31:54Z

@gpucibot merge

wence- requested a review from a team as a code owner November 4, 2022 18:03

wence- requested review from mroeschke and skirui-source November 4, 2022 18:03

github-actions bot added the Python Affects Python cuDF API. label Nov 4, 2022

wence- added 3 - Ready for Review Ready for review by team breaking Breaking change labels Nov 4, 2022

galipremsagar reviewed Nov 4, 2022

View reviewed changes

python/cudf/cudf/core/column/numerical.py Outdated Show resolved Hide resolved

wence- force-pushed the wence/fix/issue-12072 branch from 2afc3e1 to c566964 Compare November 8, 2022 12:45

wence- force-pushed the wence/fix/issue-12072 branch from c566964 to d670394 Compare November 8, 2022 12:47

wence- added the bug Something isn't working label Nov 8, 2022

wence- added 5 commits November 8, 2022 17:23

Be even more careful promoting types

b5dd7b1

Check for zero cudf.Scalar in __floordiv__ and __mod__

e3f50dc

Closes rapidsai#12092.

Fix check for zero with columns in __floordiv__ and __mod__

77d94d1

Closes rapidsai#5938.

Test division by zero with Series as well

603fdbb

Handle NA cudf.Scalar in div by zero check

f023aea

wence- force-pushed the wence/fix/issue-12072 branch from f052481 to f023aea Compare November 8, 2022 18:30

xfail floordiv/mod by zero bool

33aae16

No idea how to handle the pandas weirdness here.

wence- requested a review from brandon-b-miller November 8, 2022 18:51

wence- commented Nov 8, 2022

View reviewed changes

python/cudf/cudf/core/column/numerical.py Show resolved Hide resolved

wence- changed the title ~~Fix equality edge case in numerical binops~~ Fix type promotion edge cases in numerical binops Nov 8, 2022

wence- mentioned this pull request Nov 8, 2022

[BUG] Floor division behavior of integer by series with zeroes differs from Pandas #7389

Closed

seberg reviewed Nov 9, 2022

View reviewed changes