-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent output when using integer labels in multiindex on both column and index #14969
Comments
First, For consistency, I'd recommend using an In [22]: df.loc[pd.IndexSlice[0, :], 0]
Out[22]:
A B C D E
0 A 0.874435 0.673136 0.681053 0.352759 0.829466
B 0.325829 0.646701 0.739708 0.914715 0.297058
C 0.239715 0.955735 0.503433 0.270841 0.346910
D 0.389404 0.322453 0.934790 0.889230 0.563052
E 0.562889 0.764895 0.459072 0.351296 0.054497
In [23]: df2.loc[pd.IndexSlice[0, :], 0]
Out[23]:
0.00 0.25 0.50 0.75 1.00
0 0.00 0.187160 0.124317 0.139404 0.014958 0.297874
0.25 0.688006 0.396273 0.032172 0.285215 0.054483
0.50 0.053673 0.777064 0.504307 0.698933 0.814912
0.75 0.873036 0.694500 0.305774 0.550135 0.281881
1.00 0.472582 0.803392 0.162467 0.299709 0.605152 Using that removes the ambiguity, since you specify all the levels of the index: In [26]: pd.IndexSlice[0, :], 0
Out[26]: ((0, slice(None, None, None)), 0) |
We do have the warning here: http://pandas-docs.github.io/pandas-docs-travis/advanced.html#using-slicers, which touches on it. The example you provided would make for a fantastic addition to the docs (again, assuming that pandas is doing the correct thing here). |
@TomAugspurger is right, this is correct, passing Here's more clear indexing.
The point is you have to be very explicit and specify all dimensions. There is a very large warning on purpose. So will rerpose this as a doc issue if you would like to add something. |
Okay, thanks. I think I see what is happening then. It can't tell the difference between df.loc[(0,0)] and df.loc[(0),(0)], I'm guessing due to the syntax limitations of python itself. I assume I got my expected behavior when I used strings for the 2nd level because then pandas could tell which interpretation to use by context. Maybe more of a question for stackexchange, but while we're on topic, @TomAugspurger's suggested syntax removes the first level from the columns from the output, but not the first level of the index. I guess there is no way of doing this query while removing the first level from both? |
@relativistic You can also use Personally, I think that |
Description of problem
Forgive me if I'm missing a sublety when using integers for multiindexing, but I seem to be getting inconsistent behavior when using multiindexing. Using loc to index both column and index simultaneously doesn't always give the same result. This seems to depend on the datatype of the innermost index.
Example of the expected behavior
The following example works as I'd expect, giving me a dataframe representing the (0,0) label for the outermost index level:
Example of the unexpected behavior
However, if I change the second index level dataype to, for example, floats or ints, loc uses positional indexing rather than label based indexing for the second label. Thus, the same syntax returns a series of a single column, rather than a dataframe.
Problem description
The problem is that the output is inconsistent. My code breaks depending upon the datatypes used for the indices in a non-obvious way. I would expect things to work as in my first example, with the str dtype used for the second index level. At a minimum, I'd prefer it if the behavior was consistent, regardless of the datatype of the second index level.
Output of
pd.show_versions()
pandas: 0.19.1
nose: 1.3.7
pip: 9.0.1
setuptools: 23.1.0
Cython: 0.24
numpy: 1.10.4
scipy: 0.18.1
statsmodels: 0.6.1
xarray: 0.7.2
IPython: 4.1.2
sphinx: 1.4
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.7
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.6.0
matplotlib: 1.5.1
openpyxl: None
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: None
bs4: 4.4.1
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: