BUG: x in MultiIndex.drop(x) #19054

databasedav · 2018-01-03T12:10:06Z

closes x in pd.MultiIndex.drop(x) #19027
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

jreback · 2018-01-03T12:26:18Z

tests?

jreback · 2018-01-04T00:12:14Z

pandas/core/indexes/multi.py

@@ -2120,6 +2120,12 @@ def _maybe_to_slice(loc):
            mask[loc] = True
            return mask

+        if isinstance(key, int):


use is_integer rather than isinstance

add a comment here on what you are doing

jreback · 2018-01-04T00:12:45Z

pandas/tests/indexing/test_multiindex.py

+        idx = MultiIndex.from_product([[1, 2], [3, 4]])
+        assert 2 in idx
+        idx = idx.drop(2)
+        # drop implementation keeps 2 in the levels


blank line before the comment

jreback · 2018-01-04T00:13:03Z

pandas/tests/indexing/test_multiindex.py

+        idx = idx.drop(2)
+        # drop implementation keeps 2 in the levels
+        assert 2 in idx.levels[0]
+        # but it should no longer be in the index itself


can you test with a non-integer MI as well

This comment made me realize that I had focused onto integers specifically when this should apply to all hashable keys (sorry). But, that would mean tuples would then be treated differently from how they currently are, e.g.:

In [88]: idx = pd.MultiIndex.from_product([[(1, 2), (2, 3)], [(3, 4), (4, 5)]]) In [89]: ((1, 2), (3, 4)) in idx Out[89]: True In [90]: (1, 2) in idx Out[90]: False In [91]: ((1, 2),) in idx Out[91]: True

So, should it assume that a nested tuple like in In [89] should be treated like tuples are currently treated but a non-nested tuple like in In [90] should be treated like strings and ints?

Off the same vein, idx.drop((1, 2)) raises a KeyError and might warrant a separate issue since one would expect similar behavior to MultiIndex.from_product([[1, 2], [3, 4]]).drop(2), right?

This comment made me realize that I had focused onto integers specifically when this should apply to all hashable keys (sorry).

I think the idea should just be: any key k which is not already a tuple must become one ((k,)). The result will be, I think, consistent with current behavior (Out[89]:, Out[90]:, Out[91]:), and you don't even need to check if key can be hashed - if it can't, it will just fail later on.

Off the same vein, idx.drop((1, 2)) raises a KeyError and might warrant a separate issue

Sure, in principle we could try with ((1,2),) if (1,2) is not found. However, I personally don't think it's a good idea, as it introduces ambiguity (we should be clear to users that a tuple in a MultiIndex is first and foremost a key spanning across levels) and increases complexity.

Got it, thanks!
Doing it by simply turning non-tuples into tuples like so:

if not isinstance(key, tuple): try: return self.get_loc((key,)) except (LookupError, TypeError): raise KeyError(key)

produces a performance warning for slightly larger multi-indeces:

In [31]: idx = pd.MultiIndex.from_product([[1, 2], [3, 4]]) In [32]: 2 in idx Out[32]: True In [33]: idx = pd.MultiIndex.from_product([[2, 1, 2], [3, 4]]) In [34]: 2 in idx /home/avi/anaconda3/envs/pandas-dev/lib/python3.6/site-packages/IPython/terminal/ipapp.py:356: PerformanceWarning: indexing past lexsort depth may impact performance. self.shell.mainloop() Out[34]: True

Or we could use the fact that a key that has been dropped from a multi-index will return an empty slice when fed into the index's _get_level_indexer like so:

if not isinstance(key, tuple): loc = self._get_level_indexer(key, level=0) if isinstance(loc, slice) and loc.start == loc.stop: raise KeyError(key) return _maybe_to_slice(loc)

which does not produce the performance warnings and is ~3 times faster and simply adds to what it is right now:

pandas/pandas/core/indexes/multi.py

Lines 2123 to 2125 in 4a8496b

if not isinstance(key, tuple):

loc = self._get_level_indexer(key, level=0)

return _maybe_to_slice(loc)

but is potentially less clear (and more hacky?). Which one should I use?

I ended up using the faster one and added a comment to explain what's going on.

Yes, I think it makes sense

codecov · 2018-01-05T03:25:27Z

Codecov Report

❗ No coverage uploaded for pull request base (master@e6ea00c). Click here to learn what that means.
The diff coverage is 100%.

@@            Coverage Diff            @@
##             master   #19054   +/-   ##
=========================================
  Coverage          ?   91.51%           
=========================================
  Files             ?      148           
  Lines             ?    48783           
  Branches          ?        0           
=========================================
  Hits              ?    44642           
  Misses            ?     4141           
  Partials          ?        0

Flag	Coverage Δ
#multiple	`89.88% <100%> (?)`
#single	`41.59% <0%> (?)`

Impacted Files	Coverage Δ
pandas/core/reshape/reshape.py	`100% <100%> (ø)`
pandas/core/indexes/multi.py	`96.22% <100%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e6ea00c...a516fcc. Read the comment docs.

jreback · 2018-01-05T03:50:22Z

pandas/core/reshape/reshape.py

@@ -628,7 +628,11 @@ def _convert_level_number(level_num, columns):
    levsize = len(level_labels)
    drop_cols = []
    for key in unique_groups:
-        loc = this.columns.get_loc(key)
+        try:


what test exercises this?

pandas/pandas/tests/test_multilevel.py

Line 1195 in 35b2aba

def test_stack_order_with_unsorted_levels(self):

pandas/pandas/tests/frame/test_reshape.py

Line 136 in 35b2aba

def test_stack_mixed_level(self):

pandas/pandas/tests/frame/test_reshape.py

Line 730 in 35b2aba

def test_stack_partial_multiIndex(self):

These were the 3. And it looks like Travis doesn't have permalinks to specific lines like GitHub?

Also just took a look at the test that caused most of the builds to fail:

pandas/pandas/tests/frame/test_mutate_columns.py

Lines 186 to 198 in 35b2aba

del df['A']

assert len(df.columns) == 2

# A still in the levels, BUT get a KeyError if trying

# to delete

assert ('A', ) not in df.columns

with pytest.raises(KeyError):

del df[('A',)]

# xref: https://github.com/pandas-dev/pandas/issues/2770

# the 'A' is STILL in the columns!

assert 'A' in df.columns

and we just changed this so I'll go ahead and negate the assert.

jreback · 2018-01-05T14:19:16Z

pandas/tests/frame/test_mutate_columns.py

@@ -195,7 +195,9 @@ def test_delitem_multiindex(self):

        # xref: https://github.com/pandas-dev/pandas/issues/2770
        # the 'A' is STILL in the columns!
-        assert 'A' in df.columns
+        # the above was changed, 'A' is no longer in columns


you comment doesn't make sense here once the removed line is gone, can you clarify (and you can reference 2770), but make a nicer comment overall

jreback · 2018-01-05T14:19:26Z

pandas/tests/indexing/test_multiindex.py

@@ -705,6 +705,24 @@ def test_multiindex_symmetric_difference(self):
        result = idx ^ idx2
        assert result.names == [None, None]

+    def test_multiindex_contains_dropped(self):
+        # GH 19027


can you add a comment

jreback · 2018-01-06T17:30:35Z

cc @toobaz if any comments.

toobaz · 2018-01-07T16:48:33Z

Seems fine, just one question @gitavi : do you know if this line is still needed?

databasedav · 2018-01-07T21:36:07Z

It should no longer be needed since .get_loc now KeyErrors when an empty slice is returned and the new try, except handles the slice_len == 0 case; fixed above.

Additionally, if a boolean mask is returned, it can't possibly be of length 0 because that would require the MultiIndex to be length 0 and that would cause any key to KeyError and finally I don't think an int would ever be returned as it would have errored at len(loc) in the original code anyway.

databasedav · 2018-01-07T23:44:59Z

How do I figure out which test the Travis build timed out on? It just says
The job exceeded the maximum time limit for jobs, and has been terminated.

EDIT: Nevermind, this is the traceback of the highest exception but there's a bunch of sub-exceptions as well:

pytest -s -m single --strict --cov=pandas --cov-report xml:/tmp/cov-single.xml --junitxml=/tmp/single.xml --skip-slow --skip-network pandas
Traceback (most recent call last):
  File "/home/travis/miniconda3/envs/pandas/lib/python3.6/site-packages/_pytest/config.py", line 328, in _getconftestmodules
    return self._path2confmods[path]
KeyError: local('/home/travis/build/pandas-dev/pandas/pandas')

Not sure what's going on here...

toobaz · 2018-01-08T06:27:18Z

How do I figure out which test the Travis build timed out on?

I think there's nothing you can do, I asked in Gitter to restart it.

toobaz · 2018-01-08T06:27:57Z

It should no longer be needed

👍

jreback · 2018-01-10T00:28:03Z

thanks @gitavi

toobaz · 2018-03-19T15:06:35Z

@gitavi in case you have some idea for #20410 ...

BUG: x in MultiIndex.drop(x)

129b286

jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex labels Jan 3, 2018

Avi Sen added 2 commits January 3, 2018 15:38

fixing stack and typeerror checks

9aec9de

added test

35ee8f0

jreback requested changes Jan 4, 2018

View reviewed changes

made fixes, added additional test

698853f

jreback requested changes Jan 5, 2018

View reviewed changes

updated failing test based on new behavior

5da9830

jreback requested changes Jan 5, 2018

View reviewed changes

Avi Sen added 2 commits January 5, 2018 13:42

Merge updated 'master' into x-in-MultiIndex-drop-x

a0495a4

add/fix comments and fix merge conflict

552392f

jreback added this to the 0.23.0 milestone Jan 6, 2018

Merge branch 'master' into PR_TOOL_MERGE_PR_19054

22d3f69

jreback approved these changes Jan 6, 2018

View reviewed changes

Avi Sen added 2 commits January 7, 2018 13:34

remove redundant dropped columns conditional

9b79e96

Merge origin branch.

a516fcc

toobaz mentioned this pull request Jan 8, 2018

REF: codes-based MultiIndex engine #19074

Merged

3 tasks

jreback merged commit 9d8dbef into pandas-dev:master Jan 10, 2018

maximveksler pushed a commit to maximveksler/pandas that referenced this pull request Jan 11, 2018

BUG: x in MultiIndex.drop(x) (pandas-dev#19054)

ac54370

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: x in MultiIndex.drop(x) #19054

BUG: x in MultiIndex.drop(x) #19054

databasedav commented Jan 3, 2018 •

edited

Loading

jreback commented Jan 3, 2018

jreback Jan 4, 2018

jreback Jan 4, 2018

jreback Jan 4, 2018

jreback Jan 4, 2018

databasedav Jan 4, 2018

databasedav Jan 4, 2018 •

edited

Loading

toobaz Jan 4, 2018

toobaz Jan 4, 2018

databasedav Jan 4, 2018

databasedav Jan 5, 2018

toobaz Jan 5, 2018

codecov bot commented Jan 5, 2018 •

edited

Loading

jreback Jan 5, 2018

databasedav Jan 5, 2018

databasedav Jan 5, 2018

jreback Jan 5, 2018

jreback Jan 5, 2018

jreback commented Jan 6, 2018

toobaz commented Jan 7, 2018

databasedav commented Jan 7, 2018 •

edited

Loading

databasedav commented Jan 7, 2018 •

edited

Loading

toobaz commented Jan 8, 2018

toobaz commented Jan 8, 2018

jreback commented Jan 10, 2018

toobaz commented Mar 19, 2018

	if not isinstance(key, tuple):
	loc = self._get_level_indexer(key, level=0)
	return _maybe_to_slice(loc)

	del df['A']

	assert len(df.columns) == 2

	# A still in the levels, BUT get a KeyError if trying
	# to delete
	assert ('A', ) not in df.columns
	with pytest.raises(KeyError):
	del df[('A',)]

	# xref: https://github.com/pandas-dev/pandas/issues/2770
	# the 'A' is STILL in the columns!
	assert 'A' in df.columns

BUG: x in MultiIndex.drop(x) #19054

BUG: x in MultiIndex.drop(x) #19054

Conversation

databasedav commented Jan 3, 2018 • edited Loading

jreback commented Jan 3, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

databasedav Jan 4, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Jan 5, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Jan 6, 2018

toobaz commented Jan 7, 2018

databasedav commented Jan 7, 2018 • edited Loading

databasedav commented Jan 7, 2018 • edited Loading

toobaz commented Jan 8, 2018

toobaz commented Jan 8, 2018

jreback commented Jan 10, 2018

toobaz commented Mar 19, 2018

databasedav commented Jan 3, 2018 •

edited

Loading

databasedav Jan 4, 2018 •

edited

Loading

codecov bot commented Jan 5, 2018 •

edited

Loading

databasedav commented Jan 7, 2018 •

edited

Loading

databasedav commented Jan 7, 2018 •

edited

Loading