-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Overview of [] (__getitem__) API #9595
Comments
@jorisvandenbossche this is a really nice summary. I think in general we can move I suppose we should prepare any changes for 0.17.0 as these will technically be API changes. |
xref #9213, CC @hugadams @dandavison @jorisvandenbossche Indeed, this is a nice summary of current behavior. Thanks! I think we should consider radical API changes for My two cents on indexing is that "fallback indexing" is a really bad idea. It starts with the best of intentions, but leads to things like special cases like distinctions between integer and float indexes (e.g., see #9213). In the face of ambiguity, refuse the temptation to guess. So if I were reinventing indexing rules from scratch, I would consider something like this (for
That's it. Two simple rules that probably cover 90% of existing uses of This sort of change would require a serious deprecation cycle or perhaps need to wait until pandas 1.0 (likely both), but something needs to change. The fact that even pandas developers need to run extensive experiments to figure out how |
@jorisvandenbossche Did you ever figure out how |
@shoyer nope :-) I would suspect it is largely the same, but you never know ... Will try to look at it next week |
I wanted to add this here since it is somewhat related to "String parsing for a datetime index does not seem to work" mentioned above and I have not seen it come up anywhere else. For a MultiIndex, string parsing for a datetime index with a scalar does not result in dropping the MultiIndex level.
this seems like somewhat unintuitive behaviour (to me at least) |
@matthewgilbert this is just how partial string indexing works, see the docs here. The first is treated as a slice, while the second is an exact match. |
I came around this and this seems related but could also be a bug in the above interacting with the CategoricalIndex. Using the same example as #15470: pandas 0.20.3
|
@aavanian that looks like a bug. Could you open a separate issue for it? |
Sure, done in #17569 |
If I were to rebuild pandas, I would make indexing as simple as possible and only use |
I just came here just for @jorisvandenbossche:
Thanks for the rest of the analysis! Agree it's a mess. @shoyer:
I think I disagree:
Not intuitive, and is even more confusing to the beginning when you cross reference this against the behavior of In general a " |
@sam-at-github in my suggested model, indexing like |
Oh OK, wasn't sure what you meant. I still don't think I like that much. For the second point I would prefer "every thing else fails" over switching the behavior of Update: Aside, to only allow positional slicing and not "label" based is probably even more confusing since your labels can be numerical anyway:
|
I'd also like to know the answer to this question. The behavior that surprised me today was the few cases where |
i believe we have an issue for this; would be +1 in depreciation |
@jbrockmendel can you first open (or search) an issue for this to have a discussion about it? |
@jorisvandenbossche im putting together an overview of the state of the indexing code. Is the description of the API here still accurate/complete? |
Hey, I'm working on a join-like API
Thanks |
@FluorineDog that's doesn't really seem related to this issue. Can you please open a new issue about it? |
some examples (on Series only) in #12890
I started making an overview of the indexing semantics with http://nbviewer.ipython.org/gist/jorisvandenbossche/7889b389a21b41bc1063 (only for series/frame, not for panel)
Conclusion: it is mess :-)
Summary for slicing
So, you can say that the behaviour is equivalent to
.ix
, except that the behaviour for integer labels is different for integer indexers (swapped). (For.ix
, when having an integer axis, it is always label based and no fallback to integer location based).Summary for single label
Summary for indexing with list of labels
This mainly follows
ix
, apart from points 2 and 3Summary for boolean indexing
Summary for DataFrames
This is as documented (only the boolean case is not explicitely documented I think).
For the rest (on the choses axis), it follows the same semantics as
[]
on a series, but:Questions are here:
The text was updated successfully, but these errors were encountered: