API: expected result for pow(1, pd.NA) or pow(pd.NA, 0) #29997

TomAugspurger · 2019-12-03T13:52:20Z

Should pow(1, pd.NA) be 1 or NA?

In [1]: import pandas as pd

In [2]: 1 ** pd.NA
Out[2]: NA

In [3]: import numpy as np

In [5]: 1 ** np.nan
Out[5]: 1.0

cc @jorisvandenbossche

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2019-12-03T13:52:55Z

R gives 1

> 1 ^ NA
[1] 1

jorisvandenbossche · 2019-12-03T14:10:19Z

Julia seems to do missing:

julia> 1 ^ missing
missing

But I think the rationale of returning 1 (i.e. "whathever number is used, 1 to the power something will always be 1") also makes sense.

jschendel · 2019-12-03T19:34:41Z

Likewise there's an inconsistency with np.nan for pow(pd.NA, 0):

In [1]: import numpy as np; import pandas as pd; pd.__version__
Out[1]: '0.26.0.dev0+1155.ged20822a5'

In [2]: pd.NA ** 0
Out[2]: NA

In [3]: np.nan ** 0
Out[3]: 1.0

jorisvandenbossche · 2019-12-04T06:57:18Z

R also gives 1 then:

> NA ^ 0
[1] 1

jorisvandenbossche · 2019-12-04T06:57:45Z

I am +1 changing those two cases

Closes pandas-dev#29997

brandon-b-miller · 2021-03-01T22:35:52Z

@TomAugspurger @jorisvandenbossche Sorry to raise an old issue, I've been looking at aligning cuDF with pandas for these special cases. I believe I follow what's been discussed so far here, as well as in the associated PR, however there's arguably some consistency that is broken by stopping nulls from propagating for special values. Would you mind sharing your thoughts on this tradeoff?

I don't think consistency in itself is necessarily a rock solid reason to change this, but also am struggling to be completely convinced that this is exactly right either.

TomAugspurger · 2021-03-01T23:20:40Z

I think our invariant is that NA represents unknown. So if you have an operation that is know with only one operand, like “True | NA”, then the output should be known and not NA.

brandon-b-miller · 2021-03-02T16:30:47Z

@TomAugspurger Thank you for your response. That line of logic makes a lot of sense to me and I agree with it, with the True | NA example being especially clear.

That said, I noticed that in SQL and Spark, doing a SELECT TRUE | NULL gives me NULL. There's no standard that I know of for how to handle this in python specifically, but if one takes the position that pandas and other dataframe centric tools derive a lot of their logic and use cases from SQL, one might look to the ANSI SQL standard for answers. To be clear I have not reviewed that yet, so I'm not sure if it says anything/what it says, but I think it has a fairly robust set of semantics for missing data, and if we in the python ecosystem opt to go a different way I think it's worth discussing the implications of that choice.

EDIT: Did some asking around and turned up this text from an early draft of the 92 standard, FWIW.

        1) If the value of any <numeric primary> simply contained in a
            <numeric value expression> is the null value, then the result of
            the <numeric value expression> is the null value.

jorisvandenbossche · 2021-03-02T20:51:31Z

That said, I noticed that in SQL and Spark, doing a SELECT TRUE | NULL gives me NULL.

In the original discussion about using such Kleene logic (or three-values logic) for logical operators, I actually used the fact that SQL uses this as well as an argument for it ... (see #28778).

Now, I am no SQL expert, but I could quite well be mistaken here. But from googling, https://modern-sql.com/concept/three-valued-logic seems to indicate that as well. And testing with Postgres gives:

test_db=# SELECT TRUE OR NULL;
 ?column? 
----------
 t
(1 row)

(so this returned True)

brandon-b-miller · 2021-03-02T21:51:50Z

I think there might be something specific about short circuiting logical OR or other ops that is producing that behavior. In spark, I get the same thing. As such, I think I can concede the logical OR case. However this rule doesn't seem to extend to POW.

sqltest=# SELECT POW(1, 2);
 pow 
-----
   1
(1 row)

sqltest=# SELECT POW(1, NULL);
 pow 
-----
    
(1 row)

sqltest=# SELECT POW(5, 0);
 pow 
-----
   1
(1 row)

sqltest=# SELECT POW(NULL, 0);
 pow 
-----
    
(1 row)

in spark, running

sp_df.withColumn('c', sp_df['a']**sp_df['b']).show()

gives me

+---+----+----+
|  a|   b|   c|
+---+----+----+
|  1|null|null|
+---+----+----+

jorisvandenbossche · 2021-03-08T13:36:14Z

Yeah, the power and logical ops are of course different operations, and indeed for power operations SQL seems to have a different behaviour compared to some other systems (eg numpy or R).

Now, I think there are valid arguments for both behaviours (always propagate nulls in arithmetic operations vs the result is know regardless of which value the null value would represent). So not really sure what's the best option to do here.

TomAugspurger · 2021-03-08T14:28:29Z

Now, I think there are valid arguments for both behaviours (always propagate nulls in arithmetic operations vs the result is know regardless of which value the null value would represent). So not really sure what's the best option to do here.

Why would we treat arithmetic operations differently from boolean? I think we should propagate NA if and only if the result is unknown. For the case of NA ** 0 the result is known to be 1 for all real numbers (assuming you accept that 0**0 is 1, which is what Python does).

brandon-b-miller · 2021-03-08T14:33:57Z

Having thought about this a few days I admit I still lean a little towards nulls propagating, because it seems like a simple rule that leads to predictable behavior. At the same time, libraries don't have to solve the problem of introspecting its own data to determine if it needs to do an end run around it's own null logic. Admittedly this is more of a problem for the library I work on than Pandas, but it seems like it could be extra maintenance on the Pandas side as well, especially when other questionable cases come to mind (should pd.NA * 0 be 0?)

TomAugspurger mentioned this issue Dec 3, 2019

API: Uses pd.NA in IntegerArray #29964

Merged

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Dec 5, 2019

API: Handle pow & rpow special cases

7abf40e

Closes pandas-dev#29997

TomAugspurger mentioned this issue Dec 5, 2019

API: Handle pow & rpow special cases #30097

Merged

jreback added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Numeric Operations Arithmetic, Comparison, and Logical operations labels Dec 5, 2019

jreback added this to the 1.0 milestone Dec 5, 2019

jreback closed this as completed in #30097 Dec 8, 2019

brandon-b-miller mentioned this issue Mar 1, 2021

[BUG] Behavior of __pow__ differs from Pandas for special values rapidsai/cudf#7478

Open

jorisvandenbossche changed the title ~~pow(1, pd.NA) maybe gives the wrong result~~ API: expected result for pow(1, pd.NA) or pow(pd.NA, 0) Mar 8, 2021

vyasr mentioned this issue May 9, 2022

Add a section to the docs that compares cuDF with Pandas rapidsai/cudf#10796

Merged

TomAugspurger mentioned this issue May 25, 2022

ENH: Shall we let pd.NA * 0 equals to zero? #47117

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: expected result for pow(1, pd.NA) or pow(pd.NA, 0) #29997

API: expected result for pow(1, pd.NA) or pow(pd.NA, 0) #29997

TomAugspurger commented Dec 3, 2019

TomAugspurger commented Dec 3, 2019

jorisvandenbossche commented Dec 3, 2019

jschendel commented Dec 3, 2019

jorisvandenbossche commented Dec 4, 2019

jorisvandenbossche commented Dec 4, 2019

brandon-b-miller commented Mar 1, 2021

TomAugspurger commented Mar 1, 2021

brandon-b-miller commented Mar 2, 2021 •

edited

Loading

jorisvandenbossche commented Mar 2, 2021

brandon-b-miller commented Mar 2, 2021

jorisvandenbossche commented Mar 8, 2021

TomAugspurger commented Mar 8, 2021

brandon-b-miller commented Mar 8, 2021

API: expected result for pow(1, pd.NA) or pow(pd.NA, 0) #29997

API: expected result for pow(1, pd.NA) or pow(pd.NA, 0) #29997

Comments

TomAugspurger commented Dec 3, 2019

TomAugspurger commented Dec 3, 2019

jorisvandenbossche commented Dec 3, 2019

jschendel commented Dec 3, 2019

jorisvandenbossche commented Dec 4, 2019

jorisvandenbossche commented Dec 4, 2019

brandon-b-miller commented Mar 1, 2021

TomAugspurger commented Mar 1, 2021

brandon-b-miller commented Mar 2, 2021 • edited Loading

jorisvandenbossche commented Mar 2, 2021

brandon-b-miller commented Mar 2, 2021

jorisvandenbossche commented Mar 8, 2021

TomAugspurger commented Mar 8, 2021

brandon-b-miller commented Mar 8, 2021

brandon-b-miller commented Mar 2, 2021 •

edited

Loading