Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mi indexing #15425

Closed
wants to merge 7 commits into from
Closed

Mi indexing #15425

wants to merge 7 commits into from

Conversation

toobaz
Copy link
Member

@toobaz toobaz commented Feb 16, 2017

This also fixes indexing with a DataFrame, which was actually not mentioned in #15424 (but it's a twoliner)

@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex labels Feb 16, 2017
# passing a dataframe as a key with a MultiIndex
index = MultiIndex.from_product([[1, 2, 3], ['A', 'B', 'C']])
x = Series(index=index, data=range(9), dtype=np.float64)
idx_keys = [(1, 'A'), (2, 'C'), (3, 'B')]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also add tests for empty Series/DataFrame indexers (Series in test above)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lost again... aren't you referring to something like this and this?!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use an empty series / dataframe as an indexer (no data)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Series(data=[], dtype=np.float64) isn't empty?!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh it's in the above be test ok then

@jreback
Copy link
Contributor

jreback commented Feb 16, 2017

lgtm. just a small test addition. ping when pushed and green.

@codecov-io
Copy link

codecov-io commented Feb 16, 2017

Codecov Report

Merging #15425 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #15425      +/-   ##
==========================================
+ Coverage   90.37%   90.37%   +<.01%     
==========================================
  Files         135      135              
  Lines       49463    49467       +4     
==========================================
+ Hits        44702    44707       +5     
+ Misses       4761     4760       -1
Impacted Files Coverage Δ
pandas/core/indexing.py 94.22% <100%> (+0.02%)
pandas/core/common.py 91.36% <ø> (+0.33%)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f65a641...2ba2d5d. Read the comment docs.

@jorisvandenbossche
Copy link
Member

The ability to index with a DataFrame is actually a new feature, not a bug fix. And thus it should also documented like that if we include it.
But, personally, I would not add this. I don't see a compelling reason to have this, and it will only add further complexity to indexing (eg, should the column names align on the level names?)

@toobaz
Copy link
Member Author

toobaz commented Feb 17, 2017

Ouch, right. I caught #15424 while trying to index with a DataFrame and getting a KeyError: 0, this led me to think it was a bug and not a missing feature, sorry.

That said, I would love that feature. I understand the "complexity" argument, but I propose to reason backwards: if we don't want the column names to align on the level names, then there's no complexity added - it's already done (and really, DataFrames become tuples so early that you don't have to special-case anything else). If instead we want this to happen, I will see what I can do.

Concerning this PR: in the first case, I would just add a whatsnew section, while in the second, I would drop the two incriminated lines (and the related test).

@toobaz
Copy link
Member Author

toobaz commented Feb 17, 2017

By the way: indexing with a Series is also not documented, right?

(only now I realize that not even .loc[np.array] is supported... which instead we might want to I guess) moved to #15434

@jreback
Copy link
Contributor

jreback commented Feb 17, 2017

yeah I agree with @jorisvandenbossche how is this actually useful? it is just plain confusing, e.g. what dimensions are you actually indexing on? what happens if the levels don't match, or partially match.

I think this is too much of a rabbit hole. Take this our for now (and just leave the series fixing).

Ok with actually raising NotImplemented Error though if a df is passed like this.

You can raise a separate issue with some well-defined use cases for consideration though.

raise NotImplementedError("Indexing a MultiIndex with a "
"DataFrame key is not "
"implemented")
elif hasattr(key, 'ndim') and key.ndim > 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test using a numpy scalar as well (which is 0-dim), e.g. np.int64(5) (I think this works anyhow)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -1525,11 +1525,22 @@ def _getitem_axis(self, key, axis=0):
# possibly convert a list-like into a nested tuple
# but don't convert a list-like of tuples
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update this a bit (explain the goal of what is happening)

@@ -1521,15 +1521,24 @@ def _getitem_axis(self, key, axis=0):
return self._getbool_axis(key, axis=axis)
elif is_list_like_indexer(key):

# GH 7349
# possibly convert a list-like into a nested tuple
# but don't convert a list-like of tuples
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to (the code I currently don't understand in) #15448

@jreback
Copy link
Contributor

jreback commented Feb 18, 2017

lgtm. ping on green.

@jreback jreback added this to the 0.20.0 milestone Feb 18, 2017
@jreback
Copy link
Contributor

jreback commented Feb 20, 2017

thanks @toobaz

@jreback jreback closed this in 821be39 Feb 20, 2017
@toobaz toobaz deleted the mi_indexing branch February 21, 2017 01:00
AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this pull request Mar 21, 2017
…xers

closes pandas-dev#15424
closes pandas-dev#15434

Author: Pietro Battiston <[email protected]>

Closes pandas-dev#15425 from toobaz/mi_indexing and squashes the following commits:

2ba2d5d [Pietro Battiston] Updated comment
900e3ce [Pietro Battiston] whatsnew
8467b57 [Pietro Battiston] Tests for previous commit
17209f3 [Pietro Battiston] BUG: support indexing MultiIndex with 1-D array
7606114 [Pietro Battiston] Whatsnew
0b719f5 [Pietro Battiston] Test for previous commit
1f2f385 [Pietro Battiston] BUG: Fix indexing MultiIndex with Series with 0 not index
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Projects
None yet
4 participants