BUG: boolean indexing error with .drop() #16877

danparshall · 2017-07-10T21:10:57Z

Code Sample, a copy-pastable example if possible

df = pd.DataFrame( data = {
                         'acol'  : np.arange(4),
                         'bcol' :  2*np.arange(4)
                        })
df.drop(df.bcol > 2, axis=0, inplace=True)

print(df)

Expected Output

	acol	bcol
0	0	0
1	1	2

Observed Output

	acol	bcol
2	2	4
3	3	6
4	4	8

Problem description

The anticipated behavior was that rows with bcol > 2 would be dropped. The actual behavior is that the boolean gets converted to 0/1, and then treated as index label. So row numbers 0 and/or 1 are dropped... but all other rows will be kept.

The documentation did not make it clear what was happening.

Solutions might include documentation clarifying that .drop() cannot be used with boolean indexing, or a warning when receiving the (attempted) boolean index.

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.6.1.final.0 python-bits: 64 OS: Linux OS-release: 2.6.32-573.12.1.el6.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.20.2
pytest: 3.1.2
pip: 9.0.1
setuptools: 33.1.1.post20170320
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.1
xarray: 0.9.6
IPython: 6.1.0
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.0
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.5.0a1
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.8.0
bs4: 4.5.3
html5lib: 0.9999999
sqlalchemy: 1.1.11
pymysql: 0.7.9.None
psycopg2: 2.7.1 (dt dec pq3 ext lo64)
jinja2: 2.9.5
s3fs: 0.1.1
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

gfyoung · 2017-07-11T03:44:52Z

From the current docs:

Return new object with labels in requested axis removed.

When you call df.bcol > 2 , your labels are Series([False, False, True, True]), which pandas (and Python) would interpret as the labels 0 and 1 on the index.

I know that I am repeating part of what you said, but the documentation IMO seems to align with what it's supposed to do. Nowhere does it say that it filters by a conditional, which is what you were aiming to do.

To perform the filtering that you want, one recommended way is this:

df = df[df.bcol <= 2]

Note that using inplace=True is generally not considered good practice because it makes code more prone to bugs (we will likely deprecate and remove this option at some point).

jreback · 2017-07-11T10:34:02Z

this is a duplicate of #6189, but will keep this issue open. This is pretty easy to fix, by raising on a boolean indexer. PR's welcome!

gfyoung · 2017-07-11T15:04:55Z

@jreback : I'm not sure what the problem is here. The documentation looks pretty clear on this. #6189 demonstrates that clearly. True and False are the labels 1 and 0 respectively, which is why it works the first time but fails the second time, so I don't think there is anything to fix here. Also, you can't raise on a boolean indexer because you can have booleans as indices!

jreback · 2017-07-13T15:15:58Z

Well a boolean indexer doesn't make sense here and should raise an error. Having boolean indices is quite rare and you can also detect that case.

gfyoung · 2017-07-13T15:20:56Z

@jreback : Is that not special-casing? True and False are interpreted by Python as the labels 1 and 0 respectively, regardless of the type of index you are operating with.

jreback · 2017-07-13T15:25:03Z

its a fail-fast error check, if a boolean indexer is passed in, it should raise unless the axis is in fact a boolean index (and the shapes match).

gfyoung · 2017-07-14T07:49:41Z

Fair enough. I feel like this should just be allowed, but given the confusion it's generated amongst users (two independent issues), I concede 😄

andrejonasson · 2017-07-15T15:16:40Z

Hi, I'm working on this issue.

)

…6877)

…c and target is boolean (pandas-dev#16877)

…d target is boolean (pandas-dev#16877)

…d target is boolean (#16877) (#17343)

…d target is boolean (pandas-dev#16877) (pandas-dev#17343)

…ndex)

…dex (#27119) * TST: actually test #16877 on numeric index (not just RangeIndex) * PERF: do not instantiate IndexEngine for standard lookup over RangeIndex closes #16685

jreback added Difficulty Novice Error Reporting Incorrect or improved errors from pandas Indexing Related to indexing on series/frames, not to indexes themselves labels Jul 11, 2017

jreback added this to the Next Major Release milestone Jul 11, 2017

jreback changed the title ~~boolean indexing error with .drop()~~ ERR: boolean indexing error with .drop() Jul 11, 2017

jreback mentioned this issue Jul 11, 2017

Drop with boolean raises a ValueError #6189

Closed

andrejonasson added a commit to andrejonasson/pandas that referenced this issue Aug 19, 2017

ERR: Warn when Index is numeric and indexer is boolean (pandas-dev#16877

7e3fb66

)

andrejonasson added a commit to andrejonasson/pandas that referenced this issue Aug 19, 2017

ERR: Warn when Index is numeric and indexer is boolean (pandas-dev#16877

1627505

)

andrejonasson added a commit to andrejonasson/pandas that referenced this issue Aug 26, 2017

ERR: Warn when Index is numeric and indexer is boolean (pandas-dev#16877

c50cb80

)

andrejonasson added a commit to andrejonasson/pandas that referenced this issue Aug 26, 2017

ERR: Warn when Index is numeric and indexer is boolean (pandas-dev#16877

142f8af

)

andrejonasson added a commit to andrejonasson/pandas that referenced this issue Aug 26, 2017

ERR: Warn when Index is numeric and indexer is boolean (pandas-dev#16877

47340c3

)

andrejonasson mentioned this issue Aug 26, 2017

BUG: when Index is numeric and indexer is boolean (#16877) #17343

Merged

andrejonasson added a commit to andrejonasson/pandas that referenced this issue Aug 26, 2017

ERR: Warn when Index is numeric and indexer is boolean (pandas-dev#16877

44ffaf1

)

andrejonasson added a commit to andrejonasson/pandas that referenced this issue Aug 26, 2017

ERR: Raise when Index is numeric and indexer is boolean (pandas-dev#1…

661bfb6

…6877)

andrejonasson added a commit to andrejonasson/pandas that referenced this issue Aug 26, 2017

ERR: Raise when Index is numeric and indexer is boolean (pandas-dev#1…

4a10a11

…6877)

andrejonasson added a commit to andrejonasson/pandas that referenced this issue Aug 27, 2017

ERR: Make get_indexer return the correct indexer when Index is numeri…

0aaaad8

…c and target is boolean (pandas-dev#16877)

andrejonasson added a commit to andrejonasson/pandas that referenced this issue Aug 27, 2017

ERR: Make get_indexer return the correct indexer when Index is numeri…

085b9a0

…c and target is boolean (pandas-dev#16877)

jreback modified the milestones: 0.21.0, Next Major Release Aug 30, 2017

jreback removed the Error Reporting Incorrect or improved errors from pandas label Aug 30, 2017

jreback changed the title ~~ERR: boolean indexing error with .drop()~~ BUG: boolean indexing error with .drop() Aug 30, 2017

andrejonasson added a commit to andrejonasson/pandas that referenced this issue Aug 30, 2017

ERR: get_indexer returns the correct indexer when Index is numeric an…

5f877fa

…d target is boolean (pandas-dev#16877)

andrejonasson added a commit to andrejonasson/pandas that referenced this issue Sep 7, 2017

ERR: get_indexer returns the correct indexer when Index is numeric an…

c0a2156

…d target is boolean (pandas-dev#16877)

andrejonasson added a commit to andrejonasson/pandas that referenced this issue Sep 18, 2017

ERR: get_indexer returns the correct indexer when Index is numeric an…

5b22400

…d target is boolean (pandas-dev#16877)

andrejonasson added a commit to andrejonasson/pandas that referenced this issue Sep 18, 2017

ERR: get_indexer returns the correct indexer when Index is numeric an…

9c914ff

…d target is boolean (pandas-dev#16877)

jreback modified the milestones: 0.21.0, Next Major Release Sep 23, 2017

andrejonasson added a commit to andrejonasson/pandas that referenced this issue Sep 24, 2017

ERR: get_indexer returns the correct indexer when Index is numeric an…

9abe2a1

…d target is boolean (pandas-dev#16877)

andrejonasson added a commit to andrejonasson/pandas that referenced this issue Sep 24, 2017

ERR: get_indexer returns the correct indexer when Index is numeric an…

6e60ee1

…d target is boolean (pandas-dev#16877)

jreback modified the milestones: Next Major Release, 0.21.0 Sep 24, 2017

andrejonasson added a commit to andrejonasson/pandas that referenced this issue Sep 24, 2017

ERR: get_indexer returns the correct indexer when Index is numeric an…

0997ade

…d target is boolean (pandas-dev#16877)

andrejonasson added a commit to andrejonasson/pandas that referenced this issue Sep 25, 2017

ERR: get_indexer returns the correct indexer when Index is numeric an…

1e6c013

…d target is boolean (pandas-dev#16877)

jorisvandenbossche closed this as completed in #17343 Sep 25, 2017

jorisvandenbossche pushed a commit that referenced this issue Sep 25, 2017

ERR: get_indexer returns the correct indexer when Index is numeric an…

45a795e

…d target is boolean (#16877) (#17343)

alanbato pushed a commit to alanbato/pandas that referenced this issue Nov 10, 2017

ERR: get_indexer returns the correct indexer when Index is numeric an…

644fb32

…d target is boolean (pandas-dev#16877) (pandas-dev#17343)

jorisvandenbossche mentioned this issue Nov 14, 2017

Unexpected behaviour while dropping #18287

Closed

No-Stream pushed a commit to No-Stream/pandas that referenced this issue Nov 28, 2017

ERR: get_indexer returns the correct indexer when Index is numeric an…

f5672d9

…d target is boolean (pandas-dev#16877) (pandas-dev#17343)

toobaz added a commit to toobaz/pandas that referenced this issue Jun 29, 2019

TST: actually test pandas-dev#16877 on numeric index (not just RangeI…

53a36a7

…ndex)

toobaz added a commit to toobaz/pandas that referenced this issue Jun 29, 2019

TST: actually test pandas-dev#16877 on numeric index (not just RangeI…

2175f3e

…ndex)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: boolean indexing error with .drop() #16877

BUG: boolean indexing error with .drop() #16877

danparshall commented Jul 10, 2017

gfyoung commented Jul 11, 2017 •

edited

Loading

jreback commented Jul 11, 2017

gfyoung commented Jul 11, 2017 •

edited

Loading

jreback commented Jul 13, 2017

gfyoung commented Jul 13, 2017

jreback commented Jul 13, 2017

gfyoung commented Jul 14, 2017 •

edited

Loading

andrejonasson commented Jul 15, 2017

BUG: boolean indexing error with .drop() #16877

BUG: boolean indexing error with .drop() #16877

Comments

danparshall commented Jul 10, 2017

Code Sample, a copy-pastable example if possible

Expected Output

Observed Output

Problem description

Output of pd.show_versions()

gfyoung commented Jul 11, 2017 • edited Loading

jreback commented Jul 11, 2017

gfyoung commented Jul 11, 2017 • edited Loading

jreback commented Jul 13, 2017

gfyoung commented Jul 13, 2017

jreback commented Jul 13, 2017

gfyoung commented Jul 14, 2017 • edited Loading

andrejonasson commented Jul 15, 2017

Output of `pd.show_versions()`

gfyoung commented Jul 11, 2017 •

edited

Loading

gfyoung commented Jul 11, 2017 •

edited

Loading

gfyoung commented Jul 14, 2017 •

edited

Loading