Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: when Index is numeric and indexer is boolean (#16877) #17343

Merged

Conversation

andrejonasson
Copy link
Contributor

This warns for both tickets:
#6189
#16877

There's one particular flaw: non-unique indices never run get_indexer and deals with dropping in a way where it is difficult to inspect types. I think it should be OK to leave that one for now though.

One of the reasons I chose to warn instead of Error is that running some_numeric_index.difference([True, False]) should possibly still be allowed, while still warning the user that the difference they are getting may just be due to conversion from True to 1 (or 1.0) and False to 0 (or 0.0). This probably is rare enough that an Error might still be the best approach.

The questions remaining to me are:
Is this severe enough that an Error should be raised? What type in this case?
Is the message sufficient?

Thanks.

@codecov
Copy link

codecov bot commented Aug 26, 2017

Codecov Report

Merging #17343 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17343      +/-   ##
==========================================
- Coverage   91.03%   91.02%   -0.02%     
==========================================
  Files         162      162              
  Lines       49567    49569       +2     
==========================================
- Hits        45125    45118       -7     
- Misses       4442     4451       +9
Flag Coverage Δ
#multiple 88.8% <100%> (ø) ⬆️
#single 40.24% <50%> (-0.06%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/base.py 95.88% <100%> (ø) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.72% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 473a7f3...44ffaf1. Read the comment docs.

@codecov
Copy link

codecov bot commented Aug 26, 2017

Codecov Report

Merging #17343 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17343      +/-   ##
==========================================
- Coverage   91.23%   91.21%   -0.02%     
==========================================
  Files         163      163              
  Lines       49808    49810       +2     
==========================================
- Hits        45442    45435       -7     
- Misses       4366     4375       +9
Flag Coverage Δ
#multiple 89.01% <100%> (ø) ⬆️
#single 40.32% <50%> (-0.06%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/base.py 96.29% <100%> (ø) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.73% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 83c9205...1e6c013. Read the comment docs.

@toobaz
Copy link
Member

toobaz commented Aug 26, 2017

Is this severe enough that an Error should be raised? What type in this case?

I'm a bit in favour of raising an error (KeyError). Actually, I think the error shouldn't necessarily illustrate the dtype problem... just reporting "keys not found" would be enough.

But more in general, I wonder whether a BooleanIndex would be worth introducing, even just as an internal class. I at least didn't expect

In [2]: pd.MultiIndex.from_product([[1,2], [True, False]]).levels[1].dtype
Out[2]: dtype('O')

but that's probably just coherent with BooleanIndex not existing (MultiIndex levels of Booleans make sense, while admittedly flat indexes of Booleans don't).

Anyway, I'm afraid the current proposal is missing the following:

In [2]: pd.Series([1,2,3], index=[0, 1, False]).drop([False])
Out[2]: 
1    2
dtype: int64

@gfyoung gfyoung added Dtype Conversions Unexpected or buggy dtype conversions Error Reporting Incorrect or improved errors from pandas Indexing Related to indexing on series/frames, not to indexes themselves and removed Dtype Conversions Unexpected or buggy dtype conversions labels Aug 26, 2017
@gfyoung
Copy link
Member

gfyoung commented Aug 26, 2017

We generally try to avoid going from no error message / warning to direct error message because for downstream users, that could be too drastic of a change. A warning should suffice for now I would think.

@@ -358,6 +358,7 @@ Indexing
- Allow unicode empty strings as placeholders in multilevel columns in Python 2 (:issue:`17099`)
- Bug in ``.iloc`` when used with inplace addition or assignment and an int indexer on a ``MultiIndex`` causing the wrong indexes to be read from and written to (:issue:`17148`)
- Bug in ``.isin()`` in which checking membership in empty ``Series`` objects raised an error (:issue:`16991`)
- Warns when a numeric ``Index``'s ``get_indexer`` is passed a boolean indexer (:issue:`16877`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you might be able to use func here ? @jorisvandenbossche @jreback

tm.assert_numpy_array_equal(actual, expected, check_dtype=False)

numeric_idx = pd.Index(range(4))
with pytest.warns(UserWarning):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use our own tm.assert_produces_warning instead from pandas.util.testing.

if target.is_boolean() and self.is_numeric():
warnings.warn('Index is numeric and indexer is boolean, '
'possible unintended conversion from boolean '
'to numeric', UserWarning, stacklevel=1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can rephrase the prose here. Proper grammar and slightly more formality in the writing would be good here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'Passing a boolean indexer to an Index containing numeric indices may cause the indexer to behave as being of numeric type'

@gfyoung Thanks for the feedback. Would the above work? Any pointers would be welcome.

@toobaz
Copy link
Member

toobaz commented Aug 26, 2017

We generally try to avoid going from no error message / warning to direct error message because for downstream users, that could be too drastic of a change.

OK: let me just emphasize that in this specific case it's very likely that anyone indexing a numeric index with boolean labels is doing something wrong/assuming a behaviour (e.g. in .drop()) which is actually not supported.

@jreback
Copy link
Contributor

jreback commented Aug 26, 2017

i am ok in raising in this case
this is a clear error

@andrejonasson
Copy link
Contributor Author

andrejonasson commented Aug 26, 2017

@toobaz The example index [0, 1, False] is a duplicate index (0 and False are equivalent), which I described as not being covered by this PR. However, you are right that the PR does not cover the case with unique mixed-integer indices (e.g. [0, True, 2]). I will add a "self.holds_integer()" to the condition to cover this, since the function is already implemented.

This would not cover "mixed-float" (e.g. [1.5, True]), which is 'mixed' inferred type in pandas but 'mixed' could also be ['ab', True] in which case it would be an incorrect warning. So I'll leave that one out.

Edit: I will only add a self.holds_integer() condition in case it's a warning. Since a user with [0, True, 2] index should be able to drop the True without an Error. Even if that user may drop the 0 by passing [False].

@toobaz
Copy link
Member

toobaz commented Aug 26, 2017

I will add a "self.holds_integer()" to the condition to cover this, since the function is already implemented.

Actually, thinking to my example again I realize that you are right... but that this PR might not be the fix we need. My example should not raise an error (neither a warning); however, it should give a different result (not dropping the 0). So I'm afraid the fix is a bit deeper (wherever booleans are automatically casted to integers).

That said, waiting for a better fix, your PR certainly won't harm.

@toobaz
Copy link
Member

toobaz commented Aug 26, 2017

Just for context:

In [2]: pd.Series([1,2,3], index=[0, 1, 's']).loc[True]
Out[2]: 2

In [3]: pd.Series([1,2,3], index=[0, 1, True]).loc[True]
Out[3]: 
1       2
True    3
dtype: int64

In [4]: pd.Series([1,2,3], index=[0, 1, 2]).loc[True]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/home/nobackup/repo/pandas/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2457             try:
-> 2458                 return self._engine.get_loc(key)
   2459             except KeyError:

/home/nobackup/repo/pandas/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5272)()
    128 
--> 129     cpdef get_loc(self, object val):
    130         if is_definitely_invalid_key(val):

/home/nobackup/repo/pandas/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5077)()
    147 
--> 148         self._check_type(val)
    149 

/home/nobackup/repo/pandas/pandas/_libs/index_class_helper.pxi in pandas._libs.index.Int64Engine._check_type (pandas/_libs/index.c:16899)()
    135         if util.is_bool_object(val):
--> 136             raise KeyError(val)
    137         elif util.is_float_object(val):

KeyError: True

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
/home/nobackup/repo/pandas/pandas/core/indexing.py in _get_label(self, label, axis)
    129             try:
--> 130                 return self.obj._xs(label, axis=axis)
    131             except:

/home/nobackup/repo/pandas/pandas/core/generic.py in xs(self, key, axis, level, drop_level)
   2242         else:
-> 2243             loc = self.index.get_loc(key)
   2244 

/home/nobackup/repo/pandas/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2459             except KeyError:
-> 2460                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2461 

/home/nobackup/repo/pandas/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5272)()
    128 
--> 129     cpdef get_loc(self, object val):
    130         if is_definitely_invalid_key(val):

/home/nobackup/repo/pandas/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5077)()
    147 
--> 148         self._check_type(val)
    149 

/home/nobackup/repo/pandas/pandas/_libs/index_class_helper.pxi in pandas._libs.index.Int64Engine._check_type (pandas/_libs/index.c:16899)()
    135         if util.is_bool_object(val):
--> 136             raise KeyError(val)
    137         elif util.is_float_object(val):

KeyError: True

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-4-b1a86c8195a6> in <module>()
----> 1 pd.Series([1,2,3], index=[0, 1, 2]).loc[True]

/home/nobackup/repo/pandas/pandas/core/indexing.py in __getitem__(self, key)
   1333         else:
   1334             key = com._apply_if_callable(key, self.obj)
-> 1335             return self._getitem_axis(key, axis=0)
   1336 
   1337     def _is_scalar_access(self, key):

/home/nobackup/repo/pandas/pandas/core/indexing.py in _getitem_axis(self, key, axis)
   1557         # fall thru to straight lookup
   1558         self._has_valid_type(key, axis)
-> 1559         return self._get_label(key, axis=axis)
   1560 
   1561 

/home/nobackup/repo/pandas/pandas/core/indexing.py in _get_label(self, label, axis)
    130                 return self.obj._xs(label, axis=axis)
    131             except:
--> 132                 return self.obj[label]
    133         elif isinstance(label, tuple) and isinstance(label[axis], slice):
    134             raise IndexingError('no slices here, handle elsewhere')

/home/nobackup/repo/pandas/pandas/core/series.py in __getitem__(self, key)
    617         key = com._apply_if_callable(key, self)
    618         try:
--> 619             result = self.index.get_value(self, key)
    620 
    621             if not is_scalar(result):

/home/nobackup/repo/pandas/pandas/core/indexes/base.py in get_value(self, series, key)
   2491         try:
   2492             return self._engine.get_value(s, k,
-> 2493                                           tz=getattr(series.dtype, 'tz', None))
   2494         except KeyError as e1:
   2495             if len(self) > 0 and self.inferred_type in ['integer', 'boolean']:

/home/nobackup/repo/pandas/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value (pandas/_libs/index.c:4396)()
     93         return val in self.mapping
     94 
---> 95     cpdef get_value(self, ndarray arr, object key, object tz=None):
     96         """
     97         arr : 1-dimensional ndarray

/home/nobackup/repo/pandas/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value (pandas/_libs/index.c:4079)()
    101             void* data_ptr
    102 
--> 103         loc = self.get_loc(key)
    104         if PySlice_Check(loc) or cnp.PyArray_Check(loc):
    105             return arr[loc]

/home/nobackup/repo/pandas/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5077)()
    146             return self._get_loc_duplicates(val)
    147 
--> 148         self._check_type(val)
    149 
    150         try:

/home/nobackup/repo/pandas/pandas/_libs/index_class_helper.pxi in pandas._libs.index.Int64Engine._check_type (pandas/_libs/index.c:16899)()
    134         hash(val)
    135         if util.is_bool_object(val):
--> 136             raise KeyError(val)
    137         elif util.is_float_object(val):
    138             raise KeyError(val)

KeyError: True

@pep8speaks
Copy link

pep8speaks commented Aug 26, 2017

Hello @andrejonasson! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on September 25, 2017 at 14:07 Hours UTC

@andrejonasson
Copy link
Contributor Author

andrejonasson commented Aug 26, 2017

@toobaz I agree. This PR can prevent cases like the issues referenced until the core of the problem is solved.

I have changed it to correctly return an array of repeated -1 when the Index is numeric and the indexer is boolean.

@andrejonasson andrejonasson force-pushed the 16877-err-boolean branch 2 times, most recently from 0aaaad8 to 085b9a0 Compare August 27, 2017 05:50
@andrejonasson
Copy link
Contributor Author

>>> a = pd.Index([0,1,2,3])
>>> a.drop([True,False])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/andredotj/Projects/pandas/pandas/pandas/core/indexes/base.py", line 3641, in drop
    labels[mask])
ValueError: labels [ True False] not contained in axis
>>> a.difference([True, False])
Int64Index([0, 1, 2, 3], dtype='int64')

New update let's the client functions handle the missing values as they see fit e.g. difference correctly does not remove [0, 1].

For the example in #16877

>>> df = pd.DataFrame( data = {
...                          'acol'  : np.arange(4),
...                          'bcol' :  2*np.arange(4)
...                         })
>>> df.drop(df.bcol > 2, axis=0, inplace=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/andredotj/Projects/pandas/pandas/pandas/core/generic.py", line 2402, in drop
    new_axis = axis.drop(labels, errors=errors)
  File "/home/andredotj/Projects/pandas/pandas/pandas/core/indexes/base.py", line 3641, in drop
    labels[mask])
ValueError: labels [False False  True  True] not contained in axis

It correctly raises when passed the boolean target.

@jorisvandenbossche
Copy link
Member

Just wanted to say "shouldn't this rather return -1 than raising an error", but I see you already did that!
This looks good to me.

In [3]: pd.Series([1,2,3], index=[0, 1, True]).loc[True]
Out[3]:
1 2
True 3
dtype: int64

@toobaz I am not sure we will ever be able to solve this. When you have an 'object' index, I think we can only rely on the python object's equality / hashing, and True and 1 are equal for Python (only their identity is different, but we don't rely on that)

@andrejonasson
Copy link
Contributor Author

Is there anything that should be amended in the updated PR? Also, do I just push again to get AppVeyor to pass or will it automatically rebuild at some point?

Thanks!

@@ -358,6 +358,7 @@ Indexing
- Allow unicode empty strings as placeholders in multilevel columns in Python 2 (:issue:`17099`)
- Bug in ``.iloc`` when used with inplace addition or assignment and an int indexer on a ``MultiIndex`` causing the wrong indexes to be read from and written to (:issue:`17148`)
- Bug in ``.isin()`` in which checking membership in empty ``Series`` objects raised an error (:issue:`16991`)
- When a numeric ``Index``'s :func:`get_indexer` is passed a boolean target it now returns the correct indexer (:issue:`16877`)
Copy link
Member

@gfyoung gfyoung Aug 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback @jorisvandenbossche : I don't quite remember the formatting for :func: and stuff, but somehow the beginning of the whatsnew looks incorrectly formatted. However, correct me if I'm wrong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would actually re-write this message so it is more user-oriented (the typical user does not know Index.get_indexer). In the end it is about boolean values that now raise when indexing a numeric index or dropping from such an index?

@gfyoung for :func:, to have it working, in this case you would need to write :func:`Index.get_indexer` (but as I said, I wouldn't add that, as typical user does not need to use that function)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche : Ah, gotcha, thanks for clarifying!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, this issue is all about passing a boolean indexer to DataFrame.drop(), so say that.

expected = np.array([0, 0, 1])
tm.assert_numpy_array_equal(actual, expected, check_dtype=False)

def test_get_indexer_numeric_index_boolean_target(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reference issue number for both of your tests.

boolean_idx = pd.Index([True, False])
actual = boolean_idx.get_indexer([True, True, False])
expected = np.array([0, 0, 1])
tm.assert_numpy_array_equal(actual, expected, check_dtype=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we check dtype here (we generally prefer to)? I think if you set the dtype of expected to be np.intp, you should be okay. Same for your test below.

@gfyoung
Copy link
Member

gfyoung commented Aug 29, 2017

@andrejonasson : Don't worry about AppVeyor. We'll take care of that for you. I left a couple of comments on your PR though.

@jreback jreback changed the title ERR: Warn when Index is numeric and indexer is boolean (#16877) BUG: when Index is numeric and indexer is boolean (#16877) Aug 30, 2017
@jreback jreback added Bug and removed Error Reporting Incorrect or improved errors from pandas labels Aug 30, 2017
@@ -1131,6 +1131,18 @@ def test_get_indexer_strings(self):
with pytest.raises(TypeError):
idx.get_indexer(['a', 'b', 'c', 'd'], method='pad', tolerance=2)

def test_get_indexer_boolean_index_boolean_target(self):
boolean_idx = pd.Index([True, False])
actual = boolean_idx.get_indexer([True, True, False])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use result = rather than actual.

@@ -358,6 +358,7 @@ Indexing
- Allow unicode empty strings as placeholders in multilevel columns in Python 2 (:issue:`17099`)
- Bug in ``.iloc`` when used with inplace addition or assignment and an int indexer on a ``MultiIndex`` causing the wrong indexes to be read from and written to (:issue:`17148`)
- Bug in ``.isin()`` in which checking membership in empty ``Series`` objects raised an error (:issue:`16991`)
- When a numeric ``Index``'s :func:`get_indexer` is passed a boolean target it now returns the correct indexer (:issue:`16877`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, this issue is all about passing a boolean indexer to DataFrame.drop(), so say that.

@@ -1131,6 +1131,18 @@ def test_get_indexer_strings(self):
with pytest.raises(TypeError):
idx.get_indexer(['a', 'b', 'c', 'd'], method='pad', tolerance=2)

def test_get_indexer_boolean_index_boolean_target(self):
boolean_idx = pd.Index([True, False])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the issue number here

numeric_idx = pd.Index(range(4))
actual = numeric_idx.get_indexer([True, False, True])
expected = np.repeat(-1, 3)
tm.assert_numpy_array_equal(actual, expected, check_dtype=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls add a test that directly exercises .drop() as in the original issue

@andrejonasson
Copy link
Contributor Author

Hi, thanks for the feedback everyone.

I removed test_get_indexer_boolean_index_boolean_target - since when I tried to make it fail (by returning wrong indexer for instance) it was already covered by other tests. If redundancy is desirable I'll add it back in.

I added a direct test to pd.Series .drop() since it uses the same underlying function as pd.DataFrame.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments and pls rebase

@@ -2589,6 +2589,9 @@ def get_indexer(self, target, method=None, limit=None, tolerance=None):
if tolerance is not None:
tolerance = self._convert_tolerance(tolerance)

if target.is_boolean() and self.is_numeric():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment here why

# GH 16877
numeric_idx = pd.Index(range(4))
result = numeric_idx.get_indexer([True, False, True])
expected = np.repeat(-1, 3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be intp_ dtype

@@ -360,6 +360,7 @@ Indexing
- Bug in ``.iloc`` when used with inplace addition or assignment and an int indexer on a ``MultiIndex`` causing the wrong indexes to be read from and written to (:issue:`17148`)
- Bug in ``.isin()`` in which checking membership in empty ``Series`` objects raised an error (:issue:`16991`)
- Bug in ``CategoricalIndex`` reindexing in which specified indices containing duplicates were not being respected (:issue:`17323`)
- Bug in :func:`DataFrame.drop` caused boolean labels ``False`` and ``True`` to be treated as labels 0 and 1 respectively when dropping indices from a numeric index (:issue:`16877`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add this will now raise.and move to api other changes section (even though its actually a bug fix, its highlited a bit more there).

@andrejonasson
Copy link
Contributor Author

@jreback Is there anything else that needs editing?

Also is there some way to make sure I don't get merge conflicts due to the whatsnew line I've added (keeps on being the reason for my conflicts)?

@gfyoung
Copy link
Member

gfyoung commented Sep 18, 2017

Also is there some way to make sure I don't get merge conflicts due to the whatsnew line I've added

Really not much you can do here. People will modify the whatsnew in other PR's, causing the merge conflicts to occur. Just remain diligent to resolve them when they pop up.

@@ -433,7 +433,7 @@ Other API Changes
- :class:`Period` is now immutable, and will now raise an ``AttributeError`` when a user tries to assign a new value to the ``ordinal`` or ``freq`` attributes (:issue:`17116`).
- :func:`to_datetime` when passed a tz-aware ``origin=`` kwarg will now raise a more informative ``ValueError`` rather than a ``TypeError`` (:issue:`16842`)
- Renamed non-functional ``index`` to ``index_col`` in :func:`read_stata` to improve API consistency (:issue:`16342`)

- Bug in :func:`DataFrame.drop` caused boolean labels ``False`` and ``True`` to be treated as labels 0 and 1 respectively when dropping indices from a numeric index, this will now raise (:issue:`16877`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is grammatically incorrect. "This will now raise" should be its own sentence.

@jreback
Copy link
Contributor

jreback commented Sep 23, 2017

can you rebase / update

@andrejonasson andrejonasson force-pushed the 16877-err-boolean branch 2 times, most recently from 9abe2a1 to 6e60ee1 Compare September 24, 2017 15:33
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. minor doc-comments

@@ -481,7 +481,7 @@ Other API Changes
- :class:`Period` is now immutable, and will now raise an ``AttributeError`` when a user tries to assign a new value to the ``ordinal`` or ``freq`` attributes (:issue:`17116`).
- :func:`to_datetime` when passed a tz-aware ``origin=`` kwarg will now raise a more informative ``ValueError`` rather than a ``TypeError`` (:issue:`16842`)
- Renamed non-functional ``index`` to ``index_col`` in :func:`read_stata` to improve API consistency (:issue:`16342`)

- Bug in :func:`DataFrame.drop` caused boolean labels ``False`` and ``True`` to be treated as labels 0 and 1 respectively when dropping indices from a numeric index. This will now raise (:issue:`16877`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add what its raising (ValueError)

@@ -2609,6 +2609,11 @@ def get_indexer(self, target, method=None, limit=None, tolerance=None):
if tolerance is not None:
tolerance = self._convert_tolerance(tolerance)

# Treat boolean labels passed to a numeric index as not found. Without
# this fix False and True would be treated as 0 and 1 respectively.
if target.is_boolean() and self.is_numeric():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the issue number here as a comment

@jreback jreback added this to the 0.21.0 milestone Sep 24, 2017
@jorisvandenbossche
Copy link
Member

There is a windows failure (int32 vs int64)

# this fix False and True would be treated as 0 and 1 respectively.
# (GH #16877)
if target.is_boolean() and self.is_numeric():
return np.repeat(-1, target.size)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to put _ensure_patform_int around this before returning (similar to the final return in this function)

@TomAugspurger
Copy link
Contributor

I restarted the failed build (compile error, that I think has been fixed on master). Ping on green.

@jorisvandenbossche jorisvandenbossche merged commit 45a795e into pandas-dev:master Sep 25, 2017
@jorisvandenbossche
Copy link
Member

@andrejonasson Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: boolean indexing error with .drop()
7 participants