Maybe buggy nullable integer floor division by 0 #30188

TomAugspurger · 2019-12-10T21:13:48Z

In [6]: a = pd.array([0, 1, -1, None], dtype="Int64")

In [7]: a // 0
Out[7]: 
<IntegerArray>
[0, 0, 0, NaN]
Length: 4, dtype: Int64

In [8]: a // -0
Out[8]: 
<IntegerArray>
[0, 0, 0, NaN]
Length: 4, dtype: Int64

In [9]: a // 0.0
Out[9]: array([nan, nan, nan, nan])

In [10]: a // -0.0
Out[10]: array([nan, nan, nan, nan])

Those should probably all be NA, to match the ndarray behavior.

jbrockmendel · 2019-12-10T21:56:57Z

to match the ndarray behavior

This is a case where the Series behavior is different from the ndarray behavior. Unless there's a compelling reason to match ndarray, I'd prefer to match Series (and DataFrame and Int64Index)

AlexKirko · 2019-12-11T08:04:45Z

@jbrockmendel Wouldn't matching it to Series be less intuitive for the users? They'd have a numpy array of integers and a pandas array of integers, and those would behave differently. In my opinion, that is less expected than IntegerArray and Series behaving differently (these are very different classes).
On the other hand, matching to Series promotes consistent processing of array-like objects in pandas, but I don't think that's as important as intuitive behavior.

jorisvandenbossche · 2019-12-11T09:22:40Z

Those should probably all be NA, to match the ndarray behavior.

The numpy behaviour is actually different for integers:

In [16]: np.array([0, 0, 1]) // 0  
/home/joris/miniconda3/envs/dev/bin/ipython:1: RuntimeWarning: divide by zero encountered in floor_divide
  #!/home/joris/miniconda3/envs/dev/bin/python
Out[16]: array([0, 0, 0])

In [17]: np.array([0., 0., 1.]) // 0  
/home/joris/miniconda3/envs/dev/bin/ipython:1: RuntimeWarning: invalid value encountered in floor_divide
  #!/home/joris/miniconda3/envs/dev/bin/python
Out[17]: array([nan, nan, nan])

Not fully sure what the rationale of that difference is, though

AlexKirko · 2019-12-11T10:24:24Z

@jorisvandenbossche, thanks. The behavior of IntegerArray turns out to be the same as np.array:

np.array([0,1,-1]) // 0
C:\Users\amata_n2pl6zt\.conda\envs\pandas-dev\lib\site-packages\ipykernel_launcher.py:1: RuntimeWarning: divide by zero encountered in floor_divide
array([0, 0, 0], dtype=int32)

pd.array([0,1,-1],dtype='Int32') // 0
<IntegerArray>
[0, 0, 0]
Length: 3, dtype: Int32

To me, this looks like a numpy bug.

jorisvandenbossche · 2019-12-11T10:30:46Z

The reasoning might be: floor division of an integer array returns a new integer array. So for that reason, the result cannot hold NaN, and 0 was taken instead.

TomAugspurger · 2019-12-11T14:52:32Z

Hmm so do we have a decision on the expected value? It's not clear to me what's best here.

jbrockmendel · 2019-12-11T17:06:02Z

The numpy behaviour is actually different for integers:

It's also different yet again if the zero you're dividing by is float instead of integer.

A lot of effort went in to making the Series implementation of truediv/floordiv by zero not have these surprises, and the tests (tests.arithmetic.test_numeric) are fairly thorough.

Could we do something like wrap the ops.array_ops function and then re-mask? Maybe even add IntegerArray to the parametrization in test_numeric

jbrockmendel · 2019-12-11T17:15:59Z

also seems like the options.treat_inf_as_na would be relevant

jorisvandenbossche · 2019-12-11T19:18:48Z

It's also different yet again if the zero you're dividing by is float instead of integer.

Yes, but then you are again doing a float floor division, not an integer one.
I suppose that numpy is going for type stability here, and decided on those return types:

int // int -> int
int // float -> float
float // float -> float

And if the result is int, then you are limited in what the value can be.

A lot of effort went in to making the Series implementation of truediv/floordiv by zero not have these surprises

What kind of surprises do you mean exactly? For true division I think Series and numpy array behave the same? (except for the warning that we hide)

also seems like the options.treat_inf_as_na would be relevant

I never use this, but I think this is about interpretation of inf as NA in functions that detect NA (isna, fillna, etc), not about creation of inf in operations (which is what we are discussing here I think)
Which surprises do you mean exactly? I don't think we treat true division differently as in numpy?

jbrockmendel · 2019-12-11T20:11:54Z

I never use this, but I think this is about interpretation of inf as NA in functions that detect NA (isna, fillna, etc), not about creation of inf in operations

I think you're right about how we use that option. What im thinking of is:

arr = pd.array([-1, 0, 1])
ser = pd.Series([-1, 0, 1])

>>> ser // 0
0   -inf
1    NaN
2    inf
dtype: float64

Suppose we agree that IntegerArray should behave like Series. Then the IntegerArray op gets back np.array([-inf, NaN, inf]) and has to decide how to case the result. If we decide (possibly via an option) to treat inf as NA, then we could return an all-NA IntegerArray result instead of a float array.

jorisvandenbossche · 2019-12-11T20:31:07Z

Suppose we agree that IntegerArray should behave like Series.

So what is the rationale of the current Series behaviour for floor division by 0? When returning a float result, why not follow numpy's behaviour for floats of returning all NaN ?

jbrockmendel · 2019-12-11T20:35:13Z

So what is the rationale of the current Series behaviour for floor division by 0?

My recollection of the reasoning is that the semantic meaning of division/floordiv isn't impacted by the dtypes of numerator or denominator, so the results shouldn't be either.

jorisvandenbossche · 2019-12-11T20:37:31Z

That explains why we have the same behaviour for ints and floats in Series, but not necessarily why we chose a different behaviour for floats compared to numpy?

In [54]: pd.Series([-1., 0., 1.]) // 0. 
Out[54]: 
0   -inf
1    NaN
2    inf
dtype: float64

In [55]: np.array([-1., 0., 1.]) // 0. 
/home/joris/miniconda3/envs/dev/bin/ipython:1: RuntimeWarning: invalid value encountered in floor_divide
  #!/home/joris/miniconda3/envs/dev/bin/python
Out[55]: array([nan, nan, nan])

jbrockmendel · 2019-12-11T20:48:39Z

but not necessarily why we chose a different behaviour for floats compared to numpy?

I think it was about saying that floordiv should be "close" to truediv. I like to think of it as limiting behavior, but not sure if that was part of the discussion at the time.

jorisvandenbossche · 2019-12-11T20:53:12Z

Here is some relevant discussion about returning NaN vs Inf from float floordiv: numpy/numpy#7709. So it seems most people involved in that issue agree Inf would be more logical, but nobody got ever to actually making the change at the time.

AlexKirko · 2019-12-12T07:25:16Z

@jorisvandenbossche @jbrockmendel After reading everything, I think we should mimic Series. It shouldn't matter whether a user does 1 // 0, 1 / 0, 1. // 0, or 1. / 0, because mathematically it's all the same thing. If this introduces inconsistencies with numpy, that's not good, but it's better than inconsistencies with math and logic.

jbrockmendel · 2020-09-23T17:04:02Z

related #22793

jbrockmendel · 2020-11-30T03:51:41Z

Looks like numpy/numpy#16161 changed their floordiv behavior, will have to see how much of this problem that solves.

jorisvandenbossche · 2021-02-17T08:50:20Z

Numpy indeed changed (fixed) the floordiv behaviour for float. So instead of (with older numpy):

In [1]: a = pd.array([0, 1, -1, None], dtype="Float64")

In [2]: a // 0.0
Out[2]: 
<FloatingArray>
[nan, nan, nan, <NA>]
Length: 4, dtype: Float64

using latest numpy 1.20, we now get:

In [2]: a // 0.0
Out[2]: 
<FloatingArray>
[nan, inf, -inf, <NA>]
Length: 4, dtype: Float64

For integer (the actual topic of this issue), nothing changed though, and we still have:

In [3]: a = pd.array([0, 1, -1, None], dtype="Int64")

In [4]: a // 0
Out[4]: 
<IntegerArray>
[0, 0, 0, <NA>]
Length: 4, dtype: Int64

I think for integer floor div, if we want to keep type stability of floor division always returning integer dtype (regardless of the exact value you're dividing with), the main options for 1 // 0 are: return 0 (like above, following numpy) or raise an error (something we otherwise don't do for other "invalid" operations in pandas).

Giving we have missing values here (which numpy doesn't have), we could also return NA for those cases where float floordiv would give NaN or Inf. That would at least give some indication that something went wrong. But on the other hand, that would also create an inconsistency with our nullable floating dtype, which does currently not result in NA (since it can use NaN/Inf).

cc @brandon-b-miller

jbrockmendel · 2022-02-18T23:50:44Z

Trying to address this, the correct behavior depends on a resolution to #32265

brandon-b-miller · 2022-02-22T16:58:09Z

Would it be terrible to leave the behavior undefined for the integer case? I think there's some precedent for that from c/c++ and it would avoid forcing data introspection / find and replace in some situations.

jbrockmendel · 2022-02-22T19:04:27Z

Would it be terrible to leave the behavior undefined for the integer case?

I think internal consistency with non-masked behavior is pretty important, yes.

brandon-b-miller · 2022-02-22T21:18:53Z

IIUC there's the issue of which pandas objects should behave the same as each other, and secondly if any of them should follow what numpy does for integer floor division by zero (warning, return 0).

I can't speak so much to the former, but for the latter, I merely mean to offer that perhaps numpy should not be followed in this edge case. Indeed it would seem that this came up once before (numpy/numpy#5150) and in general when facing this issue myself I tend to think along the same lines as the OP from the numpy board: preferring not to make the choice.

TomAugspurger added the Numeric Operations Arithmetic, Comparison, and Logical operations label Dec 10, 2019

TomAugspurger added this to the Contributions Welcome milestone Dec 10, 2019

simonjayhawkins added the ExtensionArray Extending pandas with custom dtypes or arrays. label Dec 11, 2019

jorisvandenbossche mentioned this issue Feb 17, 2021

QST: Integer floordiv by nullable integer series containing zero #39847

Closed

2 tasks

jorisvandenbossche changed the title ~~Maybe buggy IntegerArray floordivision by 0~~ Maybe buggy nullable integer floor division by 0 Feb 17, 2021

attack68 mentioned this issue Mar 31, 2021

BUG: Divide zero by zero returns NaN in a Float64 dtype column which cannot be handled by fillna #40704

Closed

3 tasks

mzeitlin11 mentioned this issue Mar 31, 2021

PERF/BUG: use masked algo in groupby cummin and cummax #40651

Merged

mroeschke added the Bug label Jul 23, 2021

jbrockmendel added the NA - MaskedArrays Related to pd.NA and nullable extension arrays label Dec 21, 2021

This was referenced Aug 23, 2022

[BUG] Floor division behavior of integer by series with zeroes differs from Pandas rapidsai/cudf#7389

Closed

BUG: Nullable int floordiv 0 result changed from 0 to inf #48223

Closed

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Maybe buggy nullable integer floor division by 0 #30188

Maybe buggy nullable integer floor division by 0 #30188

TomAugspurger commented Dec 10, 2019 •

edited

Loading

jbrockmendel commented Dec 10, 2019

AlexKirko commented Dec 11, 2019

jorisvandenbossche commented Dec 11, 2019

AlexKirko commented Dec 11, 2019 •

edited

Loading

jorisvandenbossche commented Dec 11, 2019

TomAugspurger commented Dec 11, 2019

jbrockmendel commented Dec 11, 2019

jbrockmendel commented Dec 11, 2019

jorisvandenbossche commented Dec 11, 2019 •

edited

Loading

jbrockmendel commented Dec 11, 2019

jorisvandenbossche commented Dec 11, 2019 •

edited

Loading

jbrockmendel commented Dec 11, 2019

jorisvandenbossche commented Dec 11, 2019 •

edited

Loading

jbrockmendel commented Dec 11, 2019

jorisvandenbossche commented Dec 11, 2019 •

edited

Loading

AlexKirko commented Dec 12, 2019

jbrockmendel commented Sep 23, 2020

jbrockmendel commented Nov 30, 2020

jorisvandenbossche commented Feb 17, 2021

jbrockmendel commented Feb 18, 2022

brandon-b-miller commented Feb 22, 2022

jbrockmendel commented Feb 22, 2022

brandon-b-miller commented Feb 22, 2022

Maybe buggy nullable integer floor division by 0 #30188

Maybe buggy nullable integer floor division by 0 #30188

Comments

TomAugspurger commented Dec 10, 2019 • edited Loading

jbrockmendel commented Dec 10, 2019

AlexKirko commented Dec 11, 2019

jorisvandenbossche commented Dec 11, 2019

AlexKirko commented Dec 11, 2019 • edited Loading

jorisvandenbossche commented Dec 11, 2019

TomAugspurger commented Dec 11, 2019

jbrockmendel commented Dec 11, 2019

jbrockmendel commented Dec 11, 2019

jorisvandenbossche commented Dec 11, 2019 • edited Loading

jbrockmendel commented Dec 11, 2019

jorisvandenbossche commented Dec 11, 2019 • edited Loading

jbrockmendel commented Dec 11, 2019

jorisvandenbossche commented Dec 11, 2019 • edited Loading

jbrockmendel commented Dec 11, 2019

jorisvandenbossche commented Dec 11, 2019 • edited Loading

AlexKirko commented Dec 12, 2019

jbrockmendel commented Sep 23, 2020

jbrockmendel commented Nov 30, 2020

jorisvandenbossche commented Feb 17, 2021

jbrockmendel commented Feb 18, 2022

brandon-b-miller commented Feb 22, 2022

jbrockmendel commented Feb 22, 2022

brandon-b-miller commented Feb 22, 2022

TomAugspurger commented Dec 10, 2019 •

edited

Loading

AlexKirko commented Dec 11, 2019 •

edited

Loading

jorisvandenbossche commented Dec 11, 2019 •

edited

Loading

jorisvandenbossche commented Dec 11, 2019 •

edited

Loading

jorisvandenbossche commented Dec 11, 2019 •

edited

Loading

jorisvandenbossche commented Dec 11, 2019 •

edited

Loading