Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.loc on Hierarchical Index with single-valued index level can drop that index level in place #13842

Closed
mborysow opened this issue Jul 29, 2016 · 32 comments
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex

Comments

@mborysow
Copy link

mborysow commented Jul 29, 2016

Small Example

In [13]: import pandas as pd
    ...:
    ...: df1 = pd.DataFrame(data=dict(A=[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
    ...:                              B=[1, 1, 2, 2, 2, 3, 1, 1, 1, 2, 3, 4],
    ...:                              C=[1, 2, 1, 2, 3, 2, 1, 2, 3, 2, 1, 4],
    ...:                              X=[1, 5, 2, 3, 8, 3, 3, 3, 1, 2, 1, 4],
    ...:                              Y=[7, 3, 4, 1, 3, 9, 9, 3, 1, 9, 3, 7]))
    ...: df1 = df1.set_index(['A', 'B', 'C'])
    ...:

In [14]: df1.loc[pd.IndexSlice[1, :, :]]
Out[14]:
     X  Y
B C
1 1  1  7
  2  5  3
2 1  2  4
  2  3  1
  3  8  3
..  .. ..
1 2  3  3
  3  1  1
2 2  2  9
3 1  1  3
4 4  4  7

[12 rows x 2 columns]

In [15]: df1
Out[15]:
     X  Y
B C
1 1  1  7
  2  5  3
2 1  2  4
  2  3  1
  3  8  3
..  .. ..
1 2  3  3
  3  1  1
2 2  2  9
3 1  1  3
4 4  4  7

[12 rows x 2 columns]

Expected Output

The output of the slice Out[14] is correct, but df1 should not be modified inplace. So the expected Out[15] is the original df1:

In [17]: df1
Out[17]:
       X  Y
A B C
1 1 1  1  7
    2  5  3
  2 1  2  4
    2  3  1
    3  8  3
...   .. ..
  1 2  3  3
    3  1  1
  2 2  2  9
  3 1  1  3
  4 4  4  7

[12 rows x 2 columns]

I'm still not good at submitting issues here with code and print out, so I appreciate your patience. Also, thank you guys for making pandas as amazing as it is!!

Anyhow...

I have dataframes that sometimes have up to 5 levels on their multiindex. It's not uncommon for me to want to just grab a subset containing only one value on a certain level. If one level of that index has only one value, then .loc can drop that level inplace. I'd say this is highly undesirable.

First the normal behavior. Here's my input:

       X  Y
A B C      
1 1 1  1  7
    2  5  3
  2 1  2  4
    2  3  1
    3  8  3
  3 2  3  9
2 1 1  3  9
    2  3  3
    3  1  1
  2 2  2  9
  3 1  1  3
  4 4  4  7

When I have a multi-indexed dataframe, and I do:
df.loc[1]
I get:

     X  Y
B C      
1 1  1  7
  2  5  3
2 1  2  4
  2  3  1
  3  8  3
3 2  3  9

I personally expect it to return the original multi-index where the first level has only that value. Sadly, it drops it entirely ( I think this is terrible, since if you plan on resetting the index or concatenating later, you've just unwittingly lost information).

Anyhow, I recognize now that you need to provide an index for all levels, e.g., the way I expected it to work can actually be achieved by (for a three level index):
df.loc[pd.IndexSlice[1, :, :]]

       X  Y
A B C      
1 1 1  1  7
    2  5  3
  2 1  2  4
    2  3  1
    3  8  3
  3 2  3  9

Here's the rub... If the level that I indexed above has more than one unique value, this works fine. If it has only one, then once again that level gets dropped, but worse, the index is modified in place during the .loc operation.
Here's the dataframe showing the bad behavior:

       X  Y
A B C      
1 1 1  1  7
    1  3  9
    2  5  3
    2  3  3
    3  1  1
  2 1  2  4
    2  3  1
    2  2  9
    3  8  3
  3 1  1  3
    2  3  9
  4 4  4  7

df.loc[pd.IndexSlice[1, :, :]] gives:

     X  Y
B C      
1 1  1  7
  1  3  9
  2  5  3
  2  3  3
  3  1  1
2 1  2  4
  2  3  1
  2  2  9
  3  8  3
3 1  1  3
  2  3  9
4 4  4  7

Same syntax as the other case, but it dropped index A. Worse is that this is now df.
print(df)

     X  Y
B C      
1 1  1  7
  1  3  9
  2  5  3
  2  3  3
  3  1  1
2 1  2  4
  2  3  1
  2  2  9
  3  8  3
3 1  1  3
  2  3  9
4 4  4  7

If I modify the syntax slightly. I.e., df.loc[pd.IndexSlice[1, :, :], :] (with the original not modifed frame, I get the expected result:

      X  Y
A B C      
1 1 1  1  7
    1  3  9
    2  5  3
    2  3  3
    3  1  1
  2 1  2  4
    2  3  1
    2  2  9
    3  8  3
  3 1  1  3
    2  3  9
  4 4  4  7

I've tried to provide a code sample with comments that demonstrates the problem.

Code Sample, a copy-pastable example if possible

import pandas as pd

df1 = pd.DataFrame(data=dict(A=[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
                             B=[1, 1, 2, 2, 2, 3, 1, 1, 1, 2, 3, 4],
                             C=[1, 2, 1, 2, 3, 2, 1, 2, 3, 2, 1, 4],
                             X=[1, 5, 2, 3, 8, 3, 3, 3, 1, 2, 1, 4],
                             Y=[7, 3, 4, 1, 3, 9, 9, 3, 1, 9, 3, 7]))
df2 = df1.copy(deep=True)
df2['A'] = [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2]

df1 = df1.set_index(['A', 'B', 'C']).sortlevel()
df2 = df2.set_index(['A', 'B', 'C']).sortlevel()
df1_copy = df1.copy()

print("Here's df2, with more than 1 unique value for the index A:")
print(df2)

# already annoyed by this, I don't think this is how it should work, but I understand it
print("\nHere's what df2.loc[1] returns")
print(df2.loc[1])

# understand how to get around it at least
print("\nCan get around this annoyance by df2.loc[pd.IndexSlice[1, :, :]]")
print(df2.loc[pd.IndexSlice[1, :, :]])

# BUT!  If it's the only one...
print("\nHere's df1, with only a single value for the index A")
print(df1)

print("\nNow let's do the same thing we did for df2, namely display df1.loc[pidx[1, :, :]]")
print(df1.loc[pd.IndexSlice[1, :, :]])

# and holy crap it's an inplace operation!
print("\nDamnit.. it dropped by index again! And... ruh roh!  It has a side effect!  Here's df1 again:")
print(df1)

print("\nDoing df1.loc[pidx[1, :, :], :] (using the original df1) works as expected.")
print(df1_copy.loc[pd.IndexSlice[1, :, :], :])

Here's what I get from running the code

Here's df2, with more than 1 unique value for the index A:
       X  Y
A B C      
1 1 1  1  7
    2  5  3
  2 1  2  4
    2  3  1
    3  8  3
  3 2  3  9
2 1 1  3  9
    2  3  3
    3  1  1
  2 2  2  9
  3 1  1  3
  4 4  4  7

Here's what df2.loc[1] returns
     X  Y
B C      
1 1  1  7
  2  5  3
2 1  2  4
  2  3  1
  3  8  3
3 2  3  9

Can get around this annoyance by df2.loc[pd.IndexSlice[1, :, :]]
       X  Y
A B C      
1 1 1  1  7
    2  5  3
  2 1  2  4
    2  3  1
    3  8  3
  3 2  3  9

Here's df1, with only a single value for the index A
       X  Y
A B C      
1 1 1  1  7
    1  3  9
    2  5  3
    2  3  3
    3  1  1
  2 1  2  4
    2  3  1
    2  2  9
    3  8  3
  3 1  1  3
    2  3  9
  4 4  4  7

Now let's do the same thing we did for df2, namely display df1.loc[pidx[1, :, :]]
     X  Y
B C      
1 1  1  7
  1  3  9
  2  5  3
  2  3  3
  3  1  1
2 1  2  4
  2  3  1
  2  2  9
  3  8  3
3 1  1  3
  2  3  9
4 4  4  7

Damnit.. it dropped by index again! And... ruh roh!  It has a side effect!  Here's df1 again:
     X  Y
B C      
1 1  1  7
  1  3  9
  2  5  3
  2  3  3
  3  1  1
2 1  2  4
  2  3  1
  2  2  9
  3  8  3
3 1  1  3
  2  3  9
4 4  4  7

Doing df1.loc[pidx[1, :, :], :] (using the original df1) works as expected.
       X  Y
A B C      
1 1 1  1  7
    1  3  9
    2  5  3
    2  3  3
    3  1  1
  2 1  2  4
    2  3  1
    2  2  9
    3  8  3
  3 1  1  3
    2  3  9
  4 4  4  7

Expected Output

What I expect from all of the examples above, is:

       X  Y
A B C      
1 1 1  1  7
    2  5  3
  2 1  2  4
    2  3  1
    3  8  3
  3 2  3  9

output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Linux
OS-release: 4.6.3-300.fc24.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.1
nose: None
pip: 8.0.3
setuptools: 20.1.1
Cython: 0.23.4
numpy: 1.10.4
scipy: 0.17.0
statsmodels: None
xarray: None
IPython: 4.1.2
sphinx: None
patsy: None
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.5.1
matplotlib: 1.5.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.10
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None

@mborysow mborysow changed the title .loc on Hierarchical Index with single-valued index can drop an index level in place .loc on Hierarchical Index with single-valued index level can drop that index level in place Jul 29, 2016
@jreback
Copy link
Contributor

jreback commented Jul 29, 2016

pls read the documentation: http://pandas.pydata.org/pandas-docs/stable/advanced.html#using-slicers

a scalar will always drop and a list will never drop

since you are showing scalars this is as expected -

@jreback jreback closed this as completed Jul 29, 2016
@mborysow
Copy link
Author

I think this issue should be re-opened.

@jreback I know you're probably busy but I think you missed the part where an existing DataFrame's index is modified in place by the use of that .loc. That is a serious issue, whether I used the correct syntax or not, .loc should never ever modify something without reassignment, right? From a user's perspective that is horrifying.

Thank you, however, regarding the syntax. I've read through that documentation several times (possibly older versions) and I thought I'd finally understood the MultiIndex slicing. It's not exactly the most straightforward thing.

@mborysow
Copy link
Author

mborysow commented Jul 29, 2016

@jreback Actually, further, "a scalar will always drop and a list will never drop" is simply not true. Look more closely at the example I gave you. I have two different dataframes with the same number of levels. In both cases I provided a scalar. In one case, the index was dropped. In the other case, it was not. If level 0 has more than one unique value, it did not drop. If it had only one unique value, it did. The scary part in particular is simply that in one of the cases the dataframe was also modified in place.

You are correct however, that df1.loc[pd.IndexSlice[[1], :, :]] gives the expected behavior.

@TomAugspurger
Copy link
Contributor

@mborysow I believe you're correct about the bug. I'll reopen.

I've also edited your original post to be a bit more succinct 😄

@TomAugspurger TomAugspurger reopened this Jul 29, 2016
@mborysow
Copy link
Author

@TomAugspurger Thanks. I'll try to cut clearer to the point next time. =)

@TomAugspurger
Copy link
Contributor

In one case, the index was dropped. In the other case, it was not.

I believe that's the difference between unique vs. dupes (though I could be wrong). Let's keep this issue focused on df1.loc[pd.IndexSlice[1, :, :]] modifying df1.

@TomAugspurger TomAugspurger added Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex labels Jul 29, 2016
@jreback
Copy link
Contributor

jreback commented Jul 29, 2016

@mborysow the problem is you are addressing 'things you don't like' and not a focused examples

e.g

In [42]: df1 = pd.DataFrame(data=dict(A=[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
    ...:     ...:                              B=[1, 1, 2, 2, 2, 3, 1, 1, 1, 2, 3, 4],
    ...:     ...:                              C=[1, 2, 1, 2, 3, 2, 1, 2, 3, 2, 1, 4],
    ...:     ...:                              X=[1, 5, 2, 3, 8, 3, 3, 3, 1, 2, 1, 4],
    ...:     ...:                              Y=[7, 3, 4, 1, 3, 9, 9, 3, 1, 9, 3, 7]))
    ...: 
    ...: 
    ...: df1 = df1.set_index(['A', 'B', 'C']).sortlevel()
    ...: 
    ...: 
    ...: 
    ...: 
    ...: 
    ...: 

In [43]: df1.loc[pd.IndexSlice[1, :, :]]
Out[43]: 
     X  Y
B C      
1 1  1  7
  1  3  9
  2  5  3
  2  3  3
  3  1  1
2 1  2  4
  2  3  1
  2  2  9
  3  8  3
3 1  1  3
  2  3  9
4 4  4  7

In [44]: df1
Out[44]: 
     X  Y
B C      
1 1  1  7
  1  3  9
  2  5  3
  2  3  3
  3  1  1
2 1  2  4
  2  3  1
  2  2  9
  3  8  3
3 1  1  3
  2  3  9
4 4  4  7

@jreback
Copy link
Contributor

jreback commented Jul 29, 2016

which does look buggy. please have a look and see if you can come up with the reason why (in the code).

@jreback jreback added this to the Next Major Release milestone Jul 29, 2016
@mborysow
Copy link
Author

mborysow commented Jul 29, 2016

@TomAugspurger Oops. I didn't notice the duplicate indices in there... I swapped in a dataframe that didn't have any, see below. Result is exactly the same, FYI.

df1 = pd.DataFrame(data=dict(A=[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
                             B=[1, 1, 2, 2, 2, 3, 1, 1, 1, 2, 3, 4],
                             C=[1, 2, 5, 2, 3, 2, 8, 4, 3, 9, 1, 4],
                             X=[1, 5, 2, 3, 8, 3, 3, 3, 5, 2, 1, 4],
                             Y=[7, 3, 4, 1, 3, 9, 9, 3, 1, 9, 3, 7]))

@mborysow
Copy link
Author

@jreback Sorry. I appreciate the feedback on issue submission. I was trying to point out the difference in behavior in the two cases. I'll try to focus it down next time. Maybe I should have opened two issues. One pointing out the difference in result and the other pointing out the side effect. Would that have been better?

@jreback
Copy link
Contributor

jreback commented Jul 29, 2016

@mborysow yes that would have been better. The first is a user question, the 2nd a bug. Ideally an issue is a simple repro that get's right to the point. The longer it is the more likely it won't be read / acted on / understood immeditaly and will just cause confusion.

@mborysow
Copy link
Author

@jreback Should I go ahead and create a new issue for that now? I suppose there's a reasonable chance they stem from the same root cause.

@jreback
Copy link
Contributor

jreback commented Jul 29, 2016

for what exactly?

@mborysow
Copy link
Author

So there were two issues.

  1. The clear bug where .loc exhibits a side effect., which you've already acknowledged this issue should focus on.

  2. Where indexing df1 and df2 the same way (pd.IndexSlice[1, :, :]]) drops index 'A' for df1 but not for df2. The only difference between the two is that all the values of A in df1 are the same, whereas df2 has two unique values.

Example output next and code to copy and paste to reproduce below...

df2, where 'A' is not dropped (A has more than one unique value)
df2:
       X  Y
A B C      
1 1 1  1  4
    2  2  3
2 2 1  3  2
    2  4  1

df2.loc[pd.IndexSlice[1, :, :]]:
       X  Y
A B C      
1 1 1  1  4
    2  2  3
df1, where 'A' is dropped (all rows have A = 1)
df1:
       X  Y
A B C      
1 1 1  1  4
    2  2  3
  2 1  3  2
    2  4  1

df1.loc[pd.IndexSlice[1, :, :]]:
     X  Y
B C      
1 1  1  4
  2  2  3
2 1  3  2
  2  4  1
Code to reproduce:
df1 = pd.DataFrame(data=dict(A=[1, 1, 1, 1],
                             B=[1, 1, 2, 2],
                             C=[1, 2, 1, 2],
                             X=[1, 2, 3, 4],
                             Y=[4, 3, 2, 1]))
df2 = df1.copy(deep=True)
df2['A'] = [1, 1, 2, 2]

df1 = df1.set_index(['A', 'B', 'C']).sortlevel()
df2 = df2.set_index(['A', 'B', 'C']).sortlevel()

print(df1.loc[pd.IndexSlice[1, :, :]])
print(df2.loc[pd.IndexSlice[1, :, :]])

@mborysow
Copy link
Author

Personally, it wouldn't surprise me to find out the the root cause of both those things is the same.

@jreback
Copy link
Contributor

jreback commented Jul 29, 2016

as I said before this is as expected

use a list to have no drops whether unique or not

further using a non unique mi is generally not supports that well

@mborysow
Copy link
Author

Yeah, I will use lists from now on for sure. But just to clarify, maybe I misunderstand what you are calling unique...

I assumed non-unique in this context meant that two rows shared an exact index., e.g.,
Row(1): A=1, B=1
Row(2): A=1, B=1

When you say non-unique mi, are you also referring to the following as a non-unique mi?
Row(1): A=1, B=1
Row(2): A=1, B=2

Is it the former, or the latter?

@shoyer
Copy link
Member

shoyer commented Jul 29, 2016

I agree with @mborysow that this behavior isn't very intuitive. It feels like an implementation detail that has leaked into the API. For operations that select out a single value along a level, I don't see why we couldn't always drop that level from the index.

@mborysow What @jreback means about "non-unique" is that each row is unique. So your second example would be unique:
Row(1): A=1, B=1
Row(2): A=1, B=2
You can test this by calling the index.is_unique property (also note index.is_monotonic).

@mborysow
Copy link
Author

mborysow commented Aug 1, 2016

@shoyer I actually would rather argue the opposite. I prefer if it would never drop the level. Worse though is that whether you select one value or multiple give you a different number of levels. Some programmatic code may just choose items that pass some threshold, if sometimes it's just one, then everywhere I do this I need code to check what the new shape of the index is, and that's not fun. For the same reason, if you select none (e.g., via an empty list in the slicer) I think it should just return an empty DataFrame with the index intact. Otherwise, any time I choose a variable number of items from the DataFrame I have to check for two separate outliers (0 values or 1 value).

It makes much more sense to me that .loc and similar indexing methods should just return a consistent number of levels regardless of what is selected.

@shoyer
Copy link
Member

shoyer commented Aug 1, 2016

Worse though is that whether you select one value or multiple give you a different number of levels. Some programmatic code may just choose items that pass some threshold, if sometimes it's just one, then everywhere I do this I need code to check what the new shape of the index is, and that's not fun.

I think I did a poor job of explaining the alternative, which is closer to the existing behavior.

I agree that behavior absolutely should not depend on data values or their length. However, it's OK to make distinctions based on types. The current behavior (for unique MultiIndexes) is:

  • For scalar values, drop the level.
  • For list values (even of length 0 or 1), keep the level.

This mirrors the rule for dropping axis with normal indexing, which in turn mirrors similar behavior from numpy. In fact, this is where the different behavior depending on uniqueness arises -- indexing a non-unique index with a scalar returns an object that still has that axis (by necessity), whereas indexing a unique-index with a scalar drops the axis.

Changing this behavior (to never drop levels/axes) might be desirable, but it would be a major API change, so it would be best discussed in a separate issue.

@mborysow
Copy link
Author

mborysow commented Aug 1, 2016

@shoyer Ahh. Then I agree completely. =)

@mborysow
Copy link
Author

mborysow commented Aug 1, 2016

@shoyer I agree completely with the following that you said:


I agree that behavior absolutely should not depend on data values or their length. However, it's OK to make distinctions based on types. The current behavior (for unique MultiIndexes) is:

For scalar values, drop the level.
For list values (even of length 0 or 1), keep the level.


Are we in agreement though that this is not what is currently happening? @jreback ck called what I described above the expected behavior.

In this comment I made above there are two dataframes:
#13842 (comment)

In the first case the indices are 111, 112, 221, and 222 and in the other it's 111, 112, 121, and 122. Clearly unique indices based on the description above, and hence why I clarified. These two have different behaviors when indexing on the scalar value. They do behave the same when indexing on the list value. I'm perfectly happy with the scalar vs. list indexing working as you've described it if it's consistent, but that's the problem, it's not currently consistent.

Anyhow, the thing I'm sure has been communicated and acknowledged is the side effect (the in-place index modification). That's a clear bug. The thing I'm not sure has been communicated is the difference in behavior based on the dataframe. I suspect strongly that the two are related, but I can't say for certain. This is the thing I was trying to clarify whether or not I should create a new issue for.

Sorry if I'm beating a dead horse. Just paranoid that I'm not communicating the issue well.

@shoyer
Copy link
Member

shoyer commented Aug 1, 2016

In the first case the indices are 111, 112, 221, and 222 and in the other it's 111, 112, 121, and 122. Clearly unique indices based on the description above, and hence why I clarified. These two have different behaviors when indexing on the scalar value. They do behave the same when indexing on the list value. I'm perfectly happy with the scalar vs. list indexing working as you've described it if it's consistent, but that's the problem, it's not currently consistent.

Yes, this looks like a bug to me. Both of these of this indexes are unique and lex-sorted (monotonic), so they should work the same way when indexed with a scalar value.

@mborysow
Copy link
Author

mborysow commented Aug 1, 2016

Hopped into a debugger and found where it is happening. This is for version 0.18.1.

I can look a little deeper if I can be directed. Let me know if this is helpful or not. Trying to help point in the right direction..

df1 (all A = 1)...
called .loc[pd.IndexSlice[1, :, :]]...
pandas.core.indexing._LocationIndexer.getitem is called with (1, slice(None, None, None), slice(None, None, None))
_getitem_tuple(key) seems to call getitem from an _iLocIndexer instance with this key:
slice(None,None, None)

df2 (multiple values for A)
Same as above, execpt the _iLocIndexer.getitem is called with this key:
(array([0, 1]), slice(None, None, None))

The actual culprit for the overwrite is in pandas.core.generic.NDFrame.xs line 1778. This code block is NOT reached by df2, only df1. At this point these two lines are executed:

result = self.iloc[loc]
result.index = new_index

loc is slice(None, None, None) for the df1 case. I'm guessing that self.iloc[:] returns the initial dataframe and not a new dataframe object pointing to the same data. Right here the index is overwritten (here the index has had level 0 dropped).

@mborysow
Copy link
Author

mborysow commented Aug 1, 2016

... A little more...

In pandas.indexes.multi.MultiIndex.get_loc_level, line 1710:
(For both df1 and df2, key is still (1, slice(None, None, None), slice(None, None, None)))

                for i, k in enumerate(key):
                    if not isinstance(k, slice):
                        k = self._get_level_indexer(k, level=i)
                        if isinstance(k, slice):
                            # everything
                            if k.start == 0 and k.stop == len(self):
                                k = slice(None, None)
                        else:
                            k_index = k

                    if isinstance(k, slice):
                        if k == slice(None, None):
                            continue
                        else:
                            raise TypeError(key)

At "k.start == 0 and k.stop == len(self)." In df1, indexing with the scalar value of 1 selects everything, so:
k.start == 0 and k.stop == len(self) evaluates true and it sets k to slice(None, None). This doesn'tt really do anything, but it prevents the TypeError from being raised (haven't followed what happens when that propagates up).

For df2, since selecting A=1 does not select all the values in the index, so the TypeError exception is raised in the last line of the code pasted above (line 1719 in pandas.indexes.multi.MultiIndex.get_loc_level.

@mborysow
Copy link
Author

mborysow commented Aug 1, 2016

@jreback, @shoyer Is the above helpful in finding a solution?
I think just fixing the side effect requires checking that loc is not slice(None, None, None) for this code, right? That's if I'm correct that iloc[slice(None, None, None)] returns a reference to the original object? If we're returning an identical object, don't muck with the index in .xs()?

result = self.iloc[loc]
result.index = new_index

could become like this:

result = self.iloc[loc]
if isinstance(loc, slice) and loc == slice(None, None, None):
    pass
else:
    result.index = new_index

@mborysow
Copy link
Author

mborysow commented Aug 1, 2016

Oh, or as @shoyer just linked to another issue, don't return the original object for .iloc[:] and .loc[:]. =)

That makes more sense.

Ok, so the inconsistency is a separate issue. I will create a new issue about it later.

@shoyer
Copy link
Member

shoyer commented Aug 1, 2016

I raised a new issue for the behavior of .iloc[:] (#13873).

@mborysow It looks like you're well on your way to a fix here -- a pull request would be very welcome! Even a temporary work around would be better than the current behavior.

@shoyer
Copy link
Member

shoyer commented Aug 1, 2016

@mborysow I would suggest something like this instead (if you don't fix the underlying issue):

if isinstance(loc, slice) and loc == slice(None, None, None):
    result = self.copy(deep=False)
else:
    result = self.iloc[loc]

@mborysow
Copy link
Author

mborysow commented Aug 1, 2016

@shoyer I'm in an airgapped environment and have actually never made a pull request (also, most of my experience is with mercurial). I'll take a stab at it tonight when I get home.

@simonjayhawkins
Copy link
Member

@jreback I think this issue was closed by #16443.

@jreback
Copy link
Contributor

jreback commented Nov 17, 2018

ok thanks. if u think we need additional tests pls PR

@jreback jreback closed this as completed Nov 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Projects
None yet
Development

No branches or pull requests

5 participants