Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparing Strings to Numbers: No ValueError #11565

Closed
mattayes opened this issue Nov 10, 2015 · 3 comments · Fixed by #29535
Closed

Comparing Strings to Numbers: No ValueError #11565

mattayes opened this issue Nov 10, 2015 · 3 comments · Fixed by #29535
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions Numeric Operations Arithmetic, Comparison, and Logical operations
Milestone

Comments

@mattayes
Copy link
Contributor

I noticed this quirk today: When you do a DataFrame-wide comparison (excluding ==) using a number, it doesn't raise a ValueError (which you'd expect in Python 3); instead it always returns True.

>>> from pandas import DataFrame
>>> df = DataFrame(x: {'x': 'foo', 'y': 'bar', 'z': 'baz'} for x in ['a', 'b', 'c']})
>>> df
     a    b    c
x  foo  foo  foo
y  bar  bar  bar
z  baz  baz  baz
>>> df < 0
      a     b     c
x  True  True  True
y  True  True  True
z  True  True  True
>>> df > 0
      a     b     c
x  True  True  True
y  True  True  True
z  True  True  True

However, when you compare a Series of strings to a number, you get the expected ValueError:

>>> df.a < 0
TypeError: unorderable types: str() < int()

Is this a bug or a feature?

Python: 3.4.3
Pandas: 0.17.0
OS: Mac OSX 10.11

@jreback jreback added the Compat pandas objects compatability with Numpy or Python functions label Nov 18, 2015
@jreback
Copy link
Contributor

jreback commented Nov 18, 2015

yeh, I think this is an api inconsistency. we should make this work I think for df.a < 0 as it works for everything else, and raising an error is not useful here.

Note that we do have code to do this comparision even in py3, see core/algorithms.py/factorize so I think that could be incorporated into the comparison routines (in ops.py)

@jreback jreback added Strings String extension data type and string data Difficulty Intermediate labels Nov 18, 2015
@jreback jreback added this to the Next Major Release milestone Nov 18, 2015
@mattayes
Copy link
Contributor Author

Oh, really? I think the error is helpful for Series. What I find problematic is that strings always return True on a DataFrame-wide comparison. I'd expect None or at the very least False.

What I originally wanted to do was replace all negative values in the DataFrame with np.nan and then df.dropna(). Under this structure I can't, because the strings will be replaced as well, leaving me with a rather empty DataFrame. Of course, if I wanted to replace all the positive values `I still couldn't!

Thoughts?

@mroeschke
Copy link
Member

This looks to raise correctly on master. Could use a test.

In [68]: df < 0
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
TypeError: '>' not supported between instances of 'str' and 'int'

In [71]: pd.__version__
Out[71]: '0.26.0.dev0+533.gd8f9be7e3'

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Compat pandas objects compatability with Numpy or Python functions Difficulty Intermediate Numeric Operations Arithmetic, Comparison, and Logical operations Strings String extension data type and string data labels Oct 11, 2019
@jbrockmendel jbrockmendel added the Numeric Operations Arithmetic, Comparison, and Logical operations label Oct 16, 2019
@jreback jreback modified the milestones: Contributions Welcome, 1.0 Nov 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants