-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: binary comparison of numpy.int/float and Series #9369
Comments
|
It is indeed being converted at some point to a 0-dim array (ipython %debug). But I'm fairly certain it's something on the numpy side. (As long as numpy<=1.8.2, everything is fine in pandas>=0.14.0. As soon as you bump numpy to 1.9.0, the example code raises the error on every version of pandas.) I think this one is going to be way over my head. But if I happen to learn C before someone else takes care of this, I'll give it a try. |
Thinking about this from a slightly different angle. Series comparison seems to works fine if LHS or RHS is one element list. In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: [0] < pd.Series(np.arange(4))
Out[3]:
0 False
1 True
2 True
3 True
dtype: bool
In [8]: pd.Series(np.arange(4)) > [0]
Out[8]:
0 False
1 True
2 True
3 True
dtype: bool Question is should the behavior be modified so that we get the same answer with one element pd.Series or np.ndarray, same as with list or scalar? numpy works with both scalar or one element np.ndarray. In [9]: np.arange(1) < np.arange(4)
Out[9]: array([False, True, True, True], dtype=bool)
In [10]: 0 < np.arange(4)
Out[10]: array([False, True, True, True], dtype=bool) Right now these comparisons will throw error. With the proposed fix, these can be expected to work fine. In [3]: pd.Series(np.arange(1)) < pd.Series(np.arange(4))
In [4]: np.arange(1) < pd.Series(np.arange(4))
In [5]: pd.Series(np.arange(4)) > pd.Series(np.arange(1))
In [6]: pd.Series(np.arange(4)) > np.arange(1) |
I can confirm @dmsul 's diagnosis of this bug. Afraid I don't have time currently to correct this but +1 for a fix. |
The basic form of this is:
|
both are correct np.int32 is a 0 dim scalar that's also an ndarray |
This is what I get:
but somehow it becomes |
Thoughts on how to look into this further? Thanks |
@gliptak you need to step thru and debug |
I did that already ... Before the call into the function it is |
AFAICT there are two more-or-less separate issues here: Series comparison with numpy scalar (works fine, probably has for a while), and Series comparison with 1-element listlike (not supported). We recently changed DataFrame broadcasting behavior to match numpy with 2-dimensional arrays with shape either (1, ncols) or (nrows, 1). We could consider doing the same for Series broadcasting against 1-dimensional objects with shape (1,). @dmsul does this synopsis appear accurate? |
@jbrockmendel No idea, it's been almost 4 years since I looked at this bug. As of numpy 1.12.1 and pandas 0.20.2 (just what I had in the nearest env at hand) there is no error, and I can't get an error when doing |
@jreback closeable? |
yeah i think this was the array_priority which was fixed a long time ago |
This only happens with the numpy object is on the left. It doesn't matter if it's an int or a float. This error does not get raised with DataFrames.
After more poking around, It looks like this actually comres from a change in numpy, between versions 1.8.2 and 1.9.0.
Running this yields
The text was updated successfully, but these errors were encountered: