Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Behavior of __pow__ differs from Pandas for special values #7478

Open
brandon-b-miller opened this issue Mar 1, 2021 · 4 comments
Open
Assignees
Labels
bug Something isn't working Python Affects Python cuDF API.

Comments

@brandon-b-miller
Copy link
Contributor

Describe the bug
In pandas, 1**<NA> == 1, whereas in cuDF, 1**<NA> == <NA>. Furthermore, in pandas, <NA> ** 0 == 1 whereas in cuDF, <NA> ** 0 == <NA>.

Steps/Code to reproduce bug

First issue:

>>> psr = pd.Series([1,2,3], dtype='int64')
>>> gsr = cudf.Series([1,2,3], dtype='int64')
>>> psr ** pd.NA
0       1
1    <NA>
2    <NA>
dtype: object
>>> gsr ** cudf.NA
0    <NA>
1    <NA>
2    <NA>
dtype: int64
>>> psr = pd.Series([None], dtype='Int64')
>>> gsr = cudf.Series([None], dtype='int64')
>>> psr
0    <NA>
dtype: Int64
>>> gsr
0    <NA>
dtype: int64
>>> psr ** 0
0    1
dtype: Int64
>>> gsr ** 0
0    <NA>
dtype: int64

Expected behavior
I believe we should match pandas here. Since the behavior we expose here is the behavior of libcudf, we might have to run a few extra kernels to explicitly solve this case.

I believe this is worth doing. It's a tradeoff between extra work that we need to do on the GPU which of course will impact performance, vs the possibility of users running the same data through the same sequence of mathematical operations between pandas and cuDF and getting a different number. IMO the second possibility is more likely to lead to issues on the user side than the first.

Environment overview (please complete the following information)

  • Environment location: Bare Metal
  • Method of cuDF install: Source

Environment details
Please run and paste the output of the cudf/print_env.sh script here, to gather any other relevant environment details

Additional context
Add any other context about the problem here.

@brandon-b-miller brandon-b-miller added bug Something isn't working Python Affects Python cuDF API. labels Mar 1, 2021
@brandon-b-miller brandon-b-miller self-assigned this Mar 1, 2021
@brandon-b-miller
Copy link
Contributor Author

relevant pandas-dev/pandas#29997

@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@github-actions
Copy link

github-actions bot commented Feb 7, 2022

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

@wence-
Copy link
Contributor

wence- commented Dec 6, 2022

Pandas matches the behaviour of floating point nan in these circumstances, whereas cudf doesn't. Arguably matching nan-behaviour is less surprising.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Python Affects Python cuDF API.
Projects
Status: No status
Development

No branches or pull requests

4 participants