Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: binary comparison of numpy.int/float and Series #9369

Closed
dmsul opened this issue Jan 29, 2015 · 14 comments
Closed

BUG: binary comparison of numpy.int/float and Series #9369

dmsul opened this issue Jan 29, 2015 · 14 comments
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions

Comments

@dmsul
Copy link

dmsul commented Jan 29, 2015

This only happens with the numpy object is on the left. It doesn't matter if it's an int or a float. This error does not get raised with DataFrames.

After more poking around, It looks like this actually comres from a change in numpy, between versions 1.8.2 and 1.9.0.

import pandas as pd
import numpy as np

s = pd.Series(np.arange(4))
arr = np.arange(4)

right = s < arr[0]
left = arr[0] > s

Running this yields

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
bug_lhs_numpy.py in <module>()
      6
      7 right = s < arr[0]
----> 8 left = arr[0] > s

C:\Anaconda\lib\site-packages\pandas-0.14.1_236_g989a51b-py2.7-win-amd64.egg\pandas\core\ops.py
ther)
    555             return NotImplemented
    556         elif isinstance(other, (pa.Array, pd.Series, pd.Index)):
--> 557             if len(self) != len(other):
    558                 raise ValueError('Lengths must match to compare')
    559             return self._constructor(na_op(self.values, np.asarray(other)),

TypeError: len() of unsized object

In [2]: type(arr[0])
Out[2]: numpy.int32
@jreback
Copy link
Contributor

jreback commented Jan 29, 2015

left = 0 > s works (e.g. a python scalar). So I think this is being treated as a 0-dim array (its a np.int64) (and not as a scalar when called.) I'll mark as a bug. Feel free to dig in.

@jreback jreback added Bug Dtype Conversions Unexpected or buggy dtype conversions labels Jan 29, 2015
@jreback jreback added this to the 0.16.0 milestone Jan 29, 2015
@dmsul
Copy link
Author

dmsul commented Jan 30, 2015

It is indeed being converted at some point to a 0-dim array (ipython %debug). But I'm fairly certain it's something on the numpy side. (As long as numpy<=1.8.2, everything is fine in pandas>=0.14.0. As soon as you bump numpy to 1.9.0, the example code raises the error on every version of pandas.) I think this one is going to be way over my head. But if I happen to learn C before someone else takes care of this, I'll give it a try.

@tvyomkesh
Copy link
Contributor

Thinking about this from a slightly different angle. Series comparison seems to works fine if LHS or RHS is one element list.

In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: [0] < pd.Series(np.arange(4))
Out[3]:
0    False
1     True
2     True
3     True
dtype: bool
In [8]: pd.Series(np.arange(4)) > [0]
Out[8]:
0    False
1     True
2     True
3     True
dtype: bool

Question is should the behavior be modified so that we get the same answer with one element pd.Series or np.ndarray, same as with list or scalar? numpy works with both scalar or one element np.ndarray.

In [9]: np.arange(1) < np.arange(4)
Out[9]: array([False,  True,  True,  True], dtype=bool)
In [10]: 0 < np.arange(4)
Out[10]: array([False,  True,  True,  True], dtype=bool)

Right now these comparisons will throw error. With the proposed fix, these can be expected to work fine.

In [3]: pd.Series(np.arange(1)) < pd.Series(np.arange(4))
In [4]: np.arange(1) < pd.Series(np.arange(4))
In [5]: pd.Series(np.arange(4)) > pd.Series(np.arange(1))
In [6]: pd.Series(np.arange(4)) > np.arange(1)

@wadawson
Copy link

I can confirm @dmsul 's diagnosis of this bug. Afraid I don't have time currently to correct this but +1 for a fix.

@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 5, 2015
@jreback jreback modified the milestones: 0.17.0, Next Major Release Jun 15, 2015
@jreback jreback modified the milestones: Next Major Release, 0.17.0 Aug 19, 2015
@gliptak
Copy link
Contributor

gliptak commented May 1, 2016

The basic form of this is:

import numpy as np
import pandas as pd
s = pd.Series([1])
b = np.int32(1)
b < s

type(b) before the call is <class 'numpy.int32'>, while it is <class 'numpy.ndarray'> within ops.wrapper (other).
How can the type of this variable change?

@jreback
Copy link
Contributor

jreback commented May 1, 2016

both are correct

np.int32 is a 0 dim scalar that's also an ndarray

@gliptak
Copy link
Contributor

gliptak commented May 1, 2016

This is what I get:

In [12]: isinstance(np.int32(1), np.ndarray)
Out[12]: False

but somehow it becomes True within ops.wrapper ...

@gliptak
Copy link
Contributor

gliptak commented May 1, 2016

Thoughts on how to look into this further? Thanks

@jreback
Copy link
Contributor

jreback commented May 1, 2016

@gliptak you need to step thru and debug

@gliptak
Copy link
Contributor

gliptak commented May 1, 2016

I did that already ... Before the call into the function it is int32 1, within the function it shows ndarray 1. Pointers to what translation could have happened in between are welcome.

@jbrockmendel
Copy link
Member

AFAICT there are two more-or-less separate issues here: Series comparison with numpy scalar (works fine, probably has for a while), and Series comparison with 1-element listlike (not supported).

We recently changed DataFrame broadcasting behavior to match numpy with 2-dimensional arrays with shape either (1, ncols) or (nrows, 1). We could consider doing the same for Series broadcasting against 1-dimensional objects with shape (1,).

@dmsul does this synopsis appear accurate?

@dmsul
Copy link
Author

dmsul commented Oct 24, 2018

@jbrockmendel No idea, it's been almost 4 years since I looked at this bug.

As of numpy 1.12.1 and pandas 0.20.2 (just what I had in the nearest env at hand) there is no error, and I can't get an error when doing s < [0] or any number of permutations. Haven't tried it with ops.wrapper, but from a practitioner's standpoint I can't recreate it.

@jbrockmendel
Copy link
Member

@jreback closeable?

@jreback
Copy link
Contributor

jreback commented Oct 25, 2018

yeah i think this was the array_priority which was fixed a long time ago

@jreback jreback closed this as completed Oct 25, 2018
@jorisvandenbossche jorisvandenbossche modified the milestones: Contributions Welcome, No action Oct 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

No branches or pull requests

7 participants