Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index division by zero not filled #19322

Closed
jbrockmendel opened this issue Jan 20, 2018 · 8 comments
Closed

Index division by zero not filled #19322

jbrockmendel opened this issue Jan 20, 2018 · 8 comments
Labels
Compat pandas objects compatability with Numpy or Python functions Dtype Conversions Unexpected or buggy dtype conversions Numeric Operations Arithmetic, Comparison, and Logical operations

Comments

@jbrockmendel
Copy link
Member


idx = pd.Index(range(5))
>>> idx
Int64Index([0, 1, 2, 3, 4], dtype='int64')

>>> idx / 0
Int64Index([0, 0, 0, 0, 0], dtype='int64')

>>> idx // 0
Int64Index([0, 0, 0, 0, 0], dtype='int64')

>>> idx % 0
Int64Index([0, 0, 0, 0, 0], dtype='int64')

>>> divmod(idx, 0)
(Int64Index([0, 0, 0, 0, 0], dtype='int64'), Int64Index([0, 0, 0, 0, 0], dtype='int64'))

>>> idx.astype('uint64') / 0
UInt64Index([0, 0, 0, 0, 0], dtype='uint64')

Others are OK:

>>> idx.__truediv__(0)
Float64Index([nan, inf, inf, inf, inf], dtype='float64')

>>> idx.astype('float64') / 0
Float64Index([nan, inf, inf, inf, inf], dtype='float64')
@jreback
Copy link
Contributor

jreback commented Jan 20, 2018

this is what numpy does though i think we changed for Series a long time ago

@jbrockmendel
Copy link
Member Author

jbrockmendel commented Jan 20, 2018

The test from #9308 for Series division is now in tests.series.test_operators. Splitting it into separate tests for Series.__div__, Series.__rdiv__, Series.__floordiv__ and parameterizing it with different dtypes and different zero-like other args, the behavior is inconsistent.

# For testing division by (or of) zero for Series with length 3, this
# gives several scalar-zeros and length-3 vector-zeros
zeros = [box([0, 0, 0], dtype=dtype)
         for box in [pd.Series, pd.Index, np.array]
         for dtype in [np.int64, np.uint64, np.float64]]
zeros.extend([np.array(0, dtype=dtype)
             for dtype in [np.int64, np.uint64, np.float64]])
zeros.extend([0, 0.0, long(0)])

class TestSeriesArithmetic(object):
    @pytest.mark.parametrize('dtype', [np.int64, np.uint64, np.float64])
    @pytest.mark.parametrize('zero', zeros)
    def test_div_zero(self, dtype, zero):
        # GH#9144
        # dtype stability GH#19322
        ser = Series([-1, 0, 1]).astype(dtype)

        result = ser / zero
        expected = Series([-np.inf, np.nan, np.inf])
        assert_series_equal(result, expected)

15 of these fail, 38 for the analogous __floordiv__ test. Before I try to fix this: is there a convention we're following where 0 / 0, 1 / 0, and -1 / 0 may vary depending on the exact dtypes of the 1s and 0s? Or should these always be np.nan, np.inf, and -np.inf, respectively?

Update The tests with dtype==np.uint64 are wrong b/c of casting.

@jreback
Copy link
Contributor

jreback commented Jan 20, 2018

there is an old issue and whatsnew about this see if u can find it
maybe version .11 or around there

@jbrockmendel
Copy link
Member Author

0.12.0 has

  - Fix modulo and integer division on Series,DataFrames to act similarly to ``float`` dtypes to return
    ``np.nan`` or ``np.inf`` as appropriate (:issue:`3590`). This correct a numpy bug that treats ``integer``
    and ``float`` dtypes differently.

It looks like #3600 and #9308 are both about this, but unsigned ints appear to be missing form the tests, along with a bunch of the reverse ops. I can patch this and implement a thorough set of test cases, but need to confirm that all ints and all zeros are interchangeable for this purpose.

@jreback
Copy link
Contributor

jreback commented Jan 20, 2018

this is tricky as numpy does different things for uint and int
they don’t convert to float

i think we should though - see what breaks

@jbrockmendel
Copy link
Member Author

Making some progress on this. It's liable to be a big diff, so my thought is to first write all the tests and xfail the ones that are currently wrong (allowing a reviewer to double-check my understanding of the desired behavior), then make reasonably-sized PRs to un-xfail those tests.

@jreback
Copy link
Contributor

jreback commented Jan 21, 2018

yeah I think these should all match what we do in series now.

In [1]: idx = pd.Index(range(5))

In [2]: s = Series(idx)

In [3]: s / 0
Out[3]: 
0    NaN
1    inf
2    inf
3    inf
4    inf
dtype: float64

In [4]: idx / s
Out[4]: 
0    NaN
1    1.0
2    1.0
3    1.0
4    1.0
dtype: float64

In [5]: idx / 0
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-5-99f2096171d8> in <module>()
----> 1 idx / 0

~/pandas/pandas/core/indexes/range.py in _evaluate_numeric_binop(self, other)
    600                     if step:
    601                         with np.errstate(all='ignore'):
--> 602                             rstep = step(self._step, other)
    603 
    604                         # we don't have a representable op

ZeroDivisionError: division by zero

In [6]: idx // 0
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-6-1648d8a4476e> in <module>()
----> 1 idx // 0

~/pandas/pandas/core/indexes/range.py in __floordiv__(self, other)
    553         if is_integer(other):
    554             if (len(self) == 0 or
--> 555                     self._start % other == 0 and
    556                     self._step % other == 0):
    557                 start = self._start // other

ZeroDivisionError: integer division or modulo by zero

In [7]: s // 0
Out[7]: 
0    NaN
1    inf
2    inf
3    inf
4    inf
dtype: float64

yes xfailing things would be ok. Note since these are in ops, It think it would make sense to split out the giant tests_ops to sub-modules first :> (I know that's for Series), but same idea.

@jreback jreback added Dtype Conversions Unexpected or buggy dtype conversions Numeric Operations Arithmetic, Comparison, and Logical operations Compat pandas objects compatability with Numpy or Python functions Difficulty Intermediate labels Jan 21, 2018
@jreback jreback added this to the Next Major Release milestone Jan 21, 2018
@jbrockmendel
Copy link
Member Author

Closed by #27321.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions Dtype Conversions Unexpected or buggy dtype conversions Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

No branches or pull requests

2 participants