-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: expected result for pow(1, pd.NA) or pow(pd.NA, 0) #29997
Comments
R gives > 1 ^ NA
[1] 1 |
Julia seems to do missing:
But I think the rationale of returning 1 (i.e. "whathever number is used, 1 to the power something will always be 1") also makes sense. |
Likewise there's an inconsistency with In [1]: import numpy as np; import pandas as pd; pd.__version__
Out[1]: '0.26.0.dev0+1155.ged20822a5'
In [2]: pd.NA ** 0
Out[2]: NA
In [3]: np.nan ** 0
Out[3]: 1.0 |
R also gives 1 then:
|
I am +1 changing those two cases |
@TomAugspurger @jorisvandenbossche Sorry to raise an old issue, I've been looking at aligning cuDF with pandas for these special cases. I believe I follow what's been discussed so far here, as well as in the associated PR, however there's arguably some consistency that is broken by stopping nulls from propagating for special values. Would you mind sharing your thoughts on this tradeoff? I don't think consistency in itself is necessarily a rock solid reason to change this, but also am struggling to be completely convinced that this is exactly right either. |
I think our invariant is that NA represents unknown. So if you have an operation that is know with only one operand, like “True | NA”, then the output should be known and not NA. |
@TomAugspurger Thank you for your response. That line of logic makes a lot of sense to me and I agree with it, with the That said, I noticed that in SQL and Spark, doing a EDIT: Did some asking around and turned up this text from an early draft of the 92 standard, FWIW.
|
In the original discussion about using such Kleene logic (or three-values logic) for logical operators, I actually used the fact that SQL uses this as well as an argument for it ... (see #28778). Now, I am no SQL expert, but I could quite well be mistaken here. But from googling, https://modern-sql.com/concept/three-valued-logic seems to indicate that as well. And testing with Postgres gives:
(so this returned True) |
I think there might be something specific about short circuiting logical
in spark, running
gives me
|
Yeah, the power and logical ops are of course different operations, and indeed for power operations SQL seems to have a different behaviour compared to some other systems (eg numpy or R). Now, I think there are valid arguments for both behaviours (always propagate nulls in arithmetic operations vs the result is know regardless of which value the null value would represent). So not really sure what's the best option to do here. |
Why would we treat arithmetic operations differently from boolean? I think we should propagate NA if and only if the result is unknown. For the case of |
Having thought about this a few days I admit I still lean a little towards nulls propagating, because it seems like a simple rule that leads to predictable behavior. At the same time, libraries don't have to solve the problem of introspecting its own data to determine if it needs to do an end run around it's own null logic. Admittedly this is more of a problem for the library I work on than Pandas, but it seems like it could be extra maintenance on the Pandas side as well, especially when other questionable cases come to mind (should |
Should
pow(1, pd.NA)
be1
orNA
?cc @jorisvandenbossche
The text was updated successfully, but these errors were encountered: