-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Indexing a MultiIndex with a (Multi)Index #15472
Comments
I guess. This is going down a rabbit whole (but one that maybe needs some attention). I am not sure what |
I guess an example would be like: In [2]: population = pd.DataFrame([['Europe', 'Italy', 'Rome', 2870336],
...: ['Europe', 'Italy', 'Naples', 975260],
...: ['Europe', 'France', 'Paris', 2229621],
...: ['North America', 'USA', 'New York', 19795791]],
...: columns=['continent', 'country', 'city', 'pop']).set_index(['continent',
...: 'country',
...: 'city'])
In [3]: good_pizza = pd.DataFrame([['Europe', 'Italy', True],
...: ['Europe', 'France', False],
...: ['North America', 'USA', False]],
...: columns=['continent', 'country', 'actually']).set_index(['continent', 'country'])['actually']
In [4]: # Worldwide access to good pizza:
...: population.loc[good_pizza[good_pizza].index]#.sum()
Out[4]:
pop
continent country
Europe Italy NaN
In [5]: # ... which should return instead the equivalent of...
...: population.loc[population.index.droplevel('city').isin(good_pizza[good_pizza].index)]#.sum()
Out[6]:
pop
continent country city
Europe Italy Rome 2870336
Naples 975260 Admittedly, nothing you couldn't do with some
|
(4. indexing with a flat index is broken too, and this is really unexpected, so something should be done anyway) |
Code Sample, a copy-pastable example if possible
Problem description
This clearly needs a unified approach (and I can try).
Expected Output
I guess most expected outputs above are obvious, except for
In [10]:
(and maybeIn [5]:
, which however is already discussed elsewhere). That is: it is not obvious whether level names in the indexer should be matched to level names in the indexed, when both are set (see this comment). It would probably be morepandas
-ish if they were.In other terms, while there is no doubt that
is wrong, we must decide whether we want
or
Output of
pd.show_versions()
pandas: 0.19.0+478.g12f2c6a
pytest: 3.0.6
pip: 8.1.2
setuptools: 28.0.0
Cython: 0.23.4
numpy: 1.12.0
scipy: 0.18.1
xarray: None
IPython: 5.1.0.dev
sphinx: 1.4.8
patsy: 0.3.0-dev
dateutil: 2.5.3
pytz: 2015.7
blosc: None
bottleneck: 1.2.0
tables: 3.2.2
numexpr: 2.6.0
feather: None
matplotlib: 2.0.0rc2
openpyxl: 2.3.0
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.3
lxml: 3.6.4
bs4: 4.5.1
html5lib: 0.999
httplib2: 0.9.1
apiclient: 1.5.2
sqlalchemy: 1.0.15
pymysql: None
psycopg2: None
jinja2: 2.8
s3fs: None
pandas_datareader: 0.2.1
The text was updated successfully, but these errors were encountered: