Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] pow operator not acting as expected. #10178

Closed
bdice opened this issue Jan 31, 2022 · 7 comments
Closed

[BUG] pow operator not acting as expected. #10178

bdice opened this issue Jan 31, 2022 · 7 comments
Assignees
Labels
bug Something isn't working Python Affects Python cuDF API.

Comments

@bdice
Copy link
Contributor

bdice commented Jan 31, 2022

Describe the bug
__pow__ / __rpow__ binary operations do not give the expected results in the following cases.

Steps/Code to reproduce bug
Powers of 3 ** 1.

>>> 3 ** cudf.Series([1])
0    2
dtype: int64
>>> cudf.Series([3]) ** 1
0    2
dtype: int64
>>> pow(cudf.Series([3]), 1)
0    2
dtype: int64
>>> cudf.Series([3]) ** cudf.Series([1])
0    2
dtype: int64
>>> cudf.Series([-3]) ** 1
0   -2
dtype: int64
>>> 3 ** cudf.Series([1, 2, 3, 4])
0     2
1     9
2    27
3    81
dtype: int64

This happens for other integers as well:

for i in range(1000):
    if i != (cudf.Series([i]) ** 1)[0]:
        print(i)

gives

3
84
86
95
117
147
158
213
...

Also, the power does not have to be 1. There are a lot of integers that fail, always off-by-one in the result.

>>> print((cudf.Series([8]) ** 7)[0], 8 ** 7)
2097151 2097152

It gives the wrong results for Series with dtypes int8, int16, int32, int64, uint8, uint16, uint32, uint64, bool (for the exponent, 3 ** cudf.Series([1], dtype=bool) gives 2).

Additionally, pandas raises when computing integers to negative integer powers. cudf gives meaningless results.

>>> cudf.Series([0, 1, 2, 3]) ** -1
0    9223372036854775807
1                      1
2                      0
3                      0
dtype: int64
>>> pd.Series([0, 1, 2, 3]) ** -1
(... traceback ...)
ValueError: Integers to negative integer powers are not allowed.

Expected behavior

>>> 3 ** 1
3

Additional context
Three is a magic number.

Using a floating type on either side gives expected results:

>>> 3 ** cudf.Series([1, 2, 3, 4], dtype=float)
0     3.0
1     9.0
2    27.0
3    81.0
dtype: float64
@bdice bdice added bug Something isn't working Needs Triage Need team to review and classify 4 - Needs cuDF (Python) Reviewer and removed Needs Triage Need team to review and classify labels Jan 31, 2022
@bdice bdice added Python Affects Python cuDF API. and removed 4 - Needs cuDF (Python) Reviewer labels Jan 31, 2022
@vyasr
Copy link
Contributor

vyasr commented Jan 31, 2022

This appears to be some sort of precision issue that manifests for specific numbers, most likely in the libcudf layer (not in cuDF Python). I can reproduce on my machine, and moreover there are other apparently unrelated numbers conforming to no specific pattern that exhibit the same behavior of x**1 == x-1.

>>> x = cudf.Series([1, 2]) 
>>> for i in range(100):
...     if (i ** x)[0] != i:
...             print(f"{i}**1 returns {(i**x)[0]}")
... 
3**1 returns 2
84**1 returns 83
86**1 returns 85
95**1 returns 94

@bdice
Copy link
Contributor Author

bdice commented Feb 3, 2022

This comment summarizes some Slack discussions, analysis, and plans for fixing this bug.

C++'s std::pow always returns double. Current our libcudf implementation of Pow always casts to double but then machinery in binary_ops.cuh casts the result to the common type of the inputs, i.e. calling Pow on integer types yields a double but it is cast back to an integer.

We want the Pow struct to conform to STL C++ conventions. NumPy, Pandas, and CuPy all return integer types when computing powers of integer bases to integer exponents. For instance, cupy handles powers of integers with exponentiation by squaring.

We need to change the behavior of Pow and introduce a new struct IntPow:

  1. Pow should always return double (it will fail if the requested output type is not double).
  2. cuDF Python will dispatch to IntPow, which will implement exponentiation-by-squaring. This algorithm is already implemented in libcudf for fixed-point powers.

The only piece that still requires some investigation is the choice of conventions for float values, like pow(float, float) -> (float or double?) across the PyData, STL, and Spark ecosystems. It appears Pandas and C++ STL would return float but it looks like Spark/Java always upcast pow arguments to double and return double. We may want to allow Pow to perform its math in float precision instead of double to align with STL conventions -- currently libcudf casts float arguments to double, then downcasts the result to the common type of the inputs, float.

This commit adds a failing C++ test and should be included in a PR to fix this bug: bdice@ae67768

@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@bdice bdice self-assigned this Mar 28, 2022
@bdice
Copy link
Contributor Author

bdice commented May 25, 2022

Issue #9346 is also relevant here, since that ipow implementation could be reused as a utility function. However, the lookup table suggested for that implementation will diverge from the runtime computation needed for this integer pow function.

@bdice
Copy link
Contributor Author

bdice commented Jun 1, 2022

This problem may also affect decimal types, since they use integers to represent the fixed point value. That should be verified before closing this issue.

rapids-bot bot pushed a commit that referenced this issue Jul 18, 2022
Partial fix for #10178 (still need to investigate whether decimal types are also affected).

This implements the `INT_POW` binary operator, which can be dispatched for integral types. This uses an [exponentiation-by-squaring](https://en.wikipedia.org/wiki/Exponentiation_by_squaring) algorithm to compute powers. Unlike `POW`, this does not cast the data to floating-point types which can suffer from precision loss when computing powers and casting back to an integral type. The cuDF Python layer has been updated to dispatch integral data to this operator, which fixes the problems seen for specific values of base and exponent (like `3**1 == 2`) noted in #10178.

Authors:
  - Tim (https://github.com/AtlantaPepsi)
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Mike Wilson (https://github.com/hyperbolic2346)
  - Jason Lowe (https://github.com/jlowe)
  - https://github.com/brandon-b-miller

URL: #11025
@github-actions
Copy link

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

@bdice
Copy link
Contributor Author

bdice commented Aug 31, 2022

This problem may also affect decimal types, since they use integers to represent the fixed point value. That should be verified before closing this issue.

A separate issue affects decimal types. I filed #11636 and will close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Python Affects Python cuDF API.
Projects
None yet
Development

No branches or pull requests

2 participants