Pandas any() returning false with true values present #23070

maanukuttan · 2018-10-10T05:44:37Z

Code Sample

In [1]: from io import StringIO

In [2]: import pandas as pd

In [3]: data = StringIO("""issue_date,issue_date_dt
   ...: ,
   ...: ,
   ...: 19600215.0,1960-02-15
   ...: ,
   ...: ,""")

In [4]: df = pd.read_csv(data, parse_dates=[1])

In [5]: df
Out[5]:
   issue_date issue_date_dt
0         NaN           NaT
1         NaN           NaT
2  19600215.0    1960-02-15
3         NaN           NaT
4         NaN           NaT

In [6]: df.any(axis=0)
Out[6]:
issue_date       True
issue_date_dt    True
dtype: bool

In [7]: df.any(axis=1)
Out[7]:
0    False
1    False
2    False
3    False
4    False
dtype: bool

Problem description

df.any(axis=0) behaves as expected. It returns True for both the columns, but df.any(axis=1) returns False for all the rows.
Note: A question with similar issue can be found here

Note: If you use notnull then we are getting the required output

In [9]: df.notnull().any(1)
Out[9]:
0    False
1    False
2     True
3    False
4    False
dtype: bool

Expected Output

df.any(axis=1) should return True for those rows with True values.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 158 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.23.4
pytest: None
pip: 10.0.1
setuptools: 39.2.0
Cython: None
numpy: 1.15.0
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.5.0
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.2.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2018-10-10T13:10:01Z

Simpler repro:

In [21]: df = pd.DataFrame({"A": [1.0], "B": [pd.Timestamp('1960-02-15')]})
    ...:
    ...:
    ...:

In [22]: df.any(1)
Out[22]:
0    False
dtype: bool

TomAugspurger · 2018-10-10T13:14:30Z

Hmm, this seems wrong. We have mixed dtypes in

pandas/pandas/core/frame.py

Lines 7142 to 7143 in 362f2e2

    
           if axis == 1 and self._is_mixed_type and self._is_datelike_mixed_type: 
        
               numeric_only = True

so we set numeric_only=True.

Then we go down to

pandas/pandas/core/frame.py

Lines 7196 to 7199 in 362f2e2

    
           if filter_type is None or filter_type == 'numeric': 
        
               data = self._get_numeric_data() 
        
           elif filter_type == 'bool': 
        
               data = self._get_bool_data()

and our filter_type is bool, so we select boolean data only.

(Pdb) l
7202            else:
7203                if numeric_only:
7204                    if filter_type is None or filter_type == 'numeric':
7205                        data = self._get_numeric_data()
7206                    elif filter_type == 'bool':
7207 ->                     data = self._get_bool_data()
7208                    else:  # pragma: no cover
7209                        msg = ("Generating numeric_only data with filter_type {f}"
7210                               "not supported.".format(f=filter_type))
7211                        raise NotImplementedError(msg)
7212                    values = data.values
(Pdb) n
> /Users/taugspurger/sandbox/pandas/pandas/core/frame.py(7212)_reduce()
-> values = data.values
(Pdb) data
Empty DataFrame
Columns: []
Index: [0]

And since that's empty, the .any(1) will be false. Off the top of my head, I'm not sure what the fix is right now.

TomAugspurger · 2018-11-06T14:52:43Z

Moving off of 0.24, but would certainly welcome a fix if anyone wants to work on this.

#24434)

* upstream/master: DOC: Fixing broken references in the docs (pandas-dev#24497) DOC: Splitting api.rst in several files (pandas-dev#24462) Fix misdescription in escapechar (pandas-dev#24490) Floor and ceil methods during pandas.eval which are provided by numexpr (pandas-dev#24355) BUG: Pandas any() returning false with true values present (GH pandas-dev#23070) (pandas-dev#24434) Misc separable pieces of pandas-dev#24024 (pandas-dev#24488) use capsys.readouterr() as named tuple (pandas-dev#24489) REF/TST: replace capture_stderr with pytest capsys fixture (pandas-dev#24496) TST- Fixing issue with test_parquet test unexpectedly passing (pandas-dev#24480) DOC: Doc build for a single doc made much faster, and clean up (pandas-dev#24428) BUG: Fix+test timezone-preservation in DTA.repeat (pandas-dev#24483) Implement reductions from pandas-dev#24024 (pandas-dev#24484)

…strings * upstream/master: TST: Skip db tests unless explicitly specified in -m pattern (pandas-dev#24492) Mix EA into DTA/TDA; part of 24024 (pandas-dev#24502) DOC: Fix building of a single API document (pandas-dev#24506) DOC: Fixing broken references in the docs (pandas-dev#24497) DOC: Splitting api.rst in several files (pandas-dev#24462) Fix misdescription in escapechar (pandas-dev#24490) Floor and ceil methods during pandas.eval which are provided by numexpr (pandas-dev#24355) BUG: Pandas any() returning false with true values present (GH pandas-dev#23070) (pandas-dev#24434) Misc separable pieces of pandas-dev#24024 (pandas-dev#24488) use capsys.readouterr() as named tuple (pandas-dev#24489) REF/TST: replace capture_stderr with pytest capsys fixture (pandas-dev#24496) TST- Fixing issue with test_parquet test unexpectedly passing (pandas-dev#24480) DOC: Doc build for a single doc made much faster, and clean up (pandas-dev#24428) BUG: Fix+test timezone-preservation in DTA.repeat (pandas-dev#24483) Implement reductions from pandas-dev#24024 (pandas-dev#24484)

* `bool_only` parameter is supported again * Commit 36ab8c9 created this regression due to a bug in all/any (pandas-dev#23070) * Reverted the regression and fixed the bug with a condition * Added tests for `bool_only` parameter

…-dev#23070) (pandas-dev#24434)

TomAugspurger added Bug Dtype Conversions labels Oct 10, 2018

TomAugspurger added this to the 0.24.0 milestone Oct 10, 2018

TomAugspurger modified the milestones: 0.24.0, Contributions Welcome Nov 6, 2018

TomAugspurger added Datetime Difficulty Intermediate labels Nov 6, 2018

makbigc mentioned this issue Dec 26, 2018

BUG: Pandas any() returning false with true values present (GH #23070) #24434

Merged

jreback modified the milestones: Contributions Welcome, 0.24.0 Dec 30, 2018

jreback closed this as completed in #24434 Dec 30, 2018

devin-petersohn mentioned this issue Feb 3, 2019

BUG: Fixing regression in DataFrame.all and DataFrame.any with bool_only=True #25102

Merged

4 tasks

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this issue Feb 28, 2019

BUG: Pandas any() returning false with true values present (GH pandas…

0614172

…-dev#23070) (pandas-dev#24434)

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this issue Feb 28, 2019

BUG: Pandas any() returning false with true values present (GH pandas…

be8ae2f

…-dev#23070) (pandas-dev#24434)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Sponsors

Pandas any() returning false with true values present #23070

Pandas any() returning false with true values present #23070

maanukuttan commented Oct 10, 2018 •

edited

Loading

INSTALLED VERSIONS

TomAugspurger commented Oct 10, 2018

TomAugspurger commented Oct 10, 2018

TomAugspurger commented Nov 6, 2018

Pandas any() returning false with true values present #23070

Pandas any() returning false with true values present #23070

Comments

maanukuttan commented Oct 10, 2018 • edited Loading

Code Sample

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

TomAugspurger commented Oct 10, 2018

TomAugspurger commented Oct 10, 2018

TomAugspurger commented Nov 6, 2018

maanukuttan commented Oct 10, 2018 •

edited

Loading

Output of `pd.show_versions()`