Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addition & integration of the integer power operator #11025

Merged
merged 25 commits into from
Jul 18, 2022

Conversation

AtlantaPepsi
Copy link
Contributor

@AtlantaPepsi AtlantaPepsi commented Jun 1, 2022

Partial fix for #10178 (still need to investigate whether decimal types are also affected).

This implements the INT_POW binary operator, which can be dispatched for integral types. This uses an exponentiation-by-squaring algorithm to compute powers. Unlike POW, this does not cast the data to floating-point types which can suffer from precision loss when computing powers and casting back to an integral type. The cuDF Python layer has been updated to dispatch integral data to this operator, which fixes the problems seen for specific values of base and exponent (like 3**1 == 2) noted in #10178.

@GPUtester
Copy link
Collaborator

Can one of the admins verify this patch?

@github-actions github-actions bot added Python Affects Python cuDF API. libcudf Affects libcudf (C++/CUDA) code. labels Jun 1, 2022
@AtlantaPepsi
Copy link
Contributor Author

@bdice

@bdice
Copy link
Contributor

bdice commented Jun 1, 2022

Thank you very much for your work on this, @AtlantaPepsi! I'll take a look at this very soon. In the meantime, I'll request permission to run CI tests.

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great start. Next steps:

  • Copy the test from this commit and add some related tests that cover the problematic values noted in [BUG] pow operator not acting as expected. #10178
  • Add Python bindings by implementing NumericalColumn.__pow__. It should check is_integer_dtype for the inputs, and if both inputs are integral, it should call through IntPow.

Happy to help with anything, just drop a comment here or ping me on Slack.

cpp/include/cudf/binaryop.hpp Outdated Show resolved Hide resolved
cpp/src/binaryop/compiled/IntPow.cu Outdated Show resolved Hide resolved
@jjacobelli
Copy link
Contributor

ok to test

@github-actions github-actions bot added the CMake CMake build issue label Jun 10, 2022
python/cudf/cudf/_lib/cpp/binaryop.pxd Outdated Show resolved Hide resolved
python/cudf/cudf/_lib/binaryop.pyx Outdated Show resolved Hide resolved
cpp/src/binaryop/compiled/util.cpp Outdated Show resolved Hide resolved
cpp/src/binaryop/compiled/binary_ops.cu Outdated Show resolved Hide resolved
@AtlantaPepsi AtlantaPepsi marked this pull request as ready for review July 2, 2022 16:33
@AtlantaPepsi AtlantaPepsi requested review from a team as code owners July 2, 2022 16:33
@AtlantaPepsi
Copy link
Contributor Author

@bdice quite some cases in BINARY_TEST suites are failing on my device, even the ones beyond BinaryOperationCompiledTest_FloatOps, so please double check the correctness of INT_POW if you would. Also as we discussed in slack, negative exponents still create wrong results, please let me know if undefined behavior is allowed for negative exponents.

@bdice bdice added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jul 8, 2022
@AtlantaPepsi AtlantaPepsi requested a review from a team as a code owner July 8, 2022 06:03
@github-actions github-actions bot added the Java Affects Java cuDF API. label Jul 8, 2022
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some comments attached -- apologies for the delay @AtlantaPepsi. I'll work through these proposed changes right now and push some commits to address them.

cpp/src/binaryop/compiled/operation.cuh Outdated Show resolved Hide resolved
cpp/src/binaryop/compiled/operation.cuh Outdated Show resolved Hide resolved
cpp/src/binaryop/compiled/operation.cuh Show resolved Hide resolved
cpp/src/binaryop/compiled/operation.cuh Outdated Show resolved Hide resolved
python/cudf/cudf/core/column/numerical.py Outdated Show resolved Hide resolved
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AtlantaPepsi Thanks for your hard work on this! I applied some changes and I think this PR is ready to approve. It fixes the core problem where incorrect power results appear in Python for integer types by dispatching to IntPow for only those types, without altering the default behavior of Pow in C++ code (which casts to floating point types). Once this gets a second C++ approval, one approval from Python and Java code owners, and CI passes, it will be ready to merge.

To reiterate my other comment: before closing #10178, we should verify whether decimal types are affected as well, and implement the same fix in a second PR.

@codecov
Copy link

codecov bot commented Jul 8, 2022

Codecov Report

❗ No coverage uploaded for pull request base (branch-22.08@bc5e769). Click here to learn what that means.
The diff coverage is n/a.

❗ Current head 9b79193 differs from pull request most recent head 5c447d1. Consider uploading reports for the commit 5c447d1 to get more accurate results

@@               Coverage Diff               @@
##             branch-22.08   #11025   +/-   ##
===============================================
  Coverage                ?   86.37%           
===============================================
  Files                   ?      144           
  Lines                   ?    22826           
  Branches                ?        0           
===============================================
  Hits                    ?    19715           
  Misses                  ?     3111           
  Partials                ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bc5e769...5c447d1. Read the comment docs.

Copy link
Contributor

@hyperbolic2346 hyperbolic2346 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jake's question aside, this looks good to me.

Copy link
Contributor

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java changes are OK as the minimal, necessary change, but the Java code for BinaryOperable.pow was not updated to leverage the new INT_POW operator as was done for the Python API. There needs to be a followup issue if it's not addressed here.

@bdice
Copy link
Contributor

bdice commented Jul 12, 2022

Java changes are OK as the minimal, necessary change, but the Java code for BinaryOperable.pow was not updated to leverage the new INT_POW operator as was done for the Python API. There needs to be a followup issue if it's not addressed here.

@jlowe See my assessment in #10178 (comment) and let me know if you agree — I think Spark/Java should not use the integer power operator because the expected return type of pow is always a floating type. That’s not true for pandas, which is what motivated this fix.

Copy link
Contributor

@brandon-b-miller brandon-b-miller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one small non-blocking suggestion otherwise lgtm.

@jlowe
Copy link
Contributor

jlowe commented Jul 13, 2022

I think Spark/Java should not use the integer power operator because the expected return type of pow is always a floating type.

Agreed it's a non-issue for Spark, but it is an inconsistency with respect to the cudf APIs across language bindings. Given the cudf Java API is only used by Spark for now it's essentially a non-issue, but it could be if someone in the future used the cudf Java API outside of Spark. Given Spark will always call it with double, having it perform this behavior won't break Spark, but it will produce better results for anything that might call it with an INT in the future.

Yeah, yeah, YAGNI applies here, so I'm fine if we decide not to follow up. On the flip side, it's also not hard to avoid this theoretical future surprise.

@bdice
Copy link
Contributor

bdice commented Jul 18, 2022

@gpucibot merge

1 similar comment
@bdice
Copy link
Contributor

bdice commented Jul 18, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit ae1b581 into rapidsai:branch-22.08 Jul 18, 2022
rapids-bot bot pushed a commit that referenced this pull request Jul 25, 2022
Fixes a compile warning was introduced in PR #11025 : [link to log containing the warning](https://gpuci.gpuopenanalytics.com/job/rapidsai/job/gpuci/job/cudf/job/prb/job/cudf-cpu-cuda-build/CUDA=11.5/10790/consoleFull)

When a templated variable `y` is unsigned the compare `(y<0)` results in a compile warning:
```
/cudf/cpp/src/binaryop/compiled/operation.cuh(226): warning #186-D: pointless comparison of unsigned integer with zero
          detected during:
            instantiation of "auto cudf::binops::compiled::ops::IntPow::operator()(TypeLhs, TypeRhs)->TypeLhs [with TypeLhs=uint8_t, TypeRhs=uint8_t, <unnamed>=(void *)nullptr]"
```
Adding an `if constexpr` around the comparison removes the warning.

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Nghia Truong (https://github.com/ttnghia)
  - Bradley Dice (https://github.com/bdice)

URL: #11339
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake CMake build issue improvement Improvement / enhancement to an existing function Java Affects Java cuDF API. libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants