-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Index Constructors inferring output from data #17246
Comments
I like how the generic I'm not a big fan of separate keyword arguments like
|
I think we should simply remove outright the OK, since both you and @jreback are in favor of that over a keyword, I'll amend #17236 to just remove that behavior (without a deprecation cycle I suppose?). And just to be clear @shoyer, you're in favor of keeping |
Sounds good to me. This should be mentioned as a breaking change, of course. I can't think of any real use cases for this behavior, but I'm sure this will still come up for someone somehow!
Yes, that seems consistent to me. I would suggest |
See #17236 (comment), I am also +1 on making the MultiIndex constructors consistently return MultiIndex (so remove the MultiIndex -> Index way) |
So it seems like the consensus is to put all the inference into The second idea is to have I'll update the top post. |
this already does what you expect
but eliminating the special case is important as well. |
@TomAugspurger yep, agree with your summary The |
ahh I would be ok with fixing this. This is not respecting the dtype.
|
Ah,good catch, that is yet another one to fix! Because the one I meant was
|
Two proposals:
Consolidate all inference to the
Index
constructorIndex(...)
inferring the best container for the data passedMultiIndex(data)
returning anIndex
when data is a list of length-1 tuples (xref API: Have MultiIndex consturctors always return a MI #17236)Passing
dtype=object
disables inferenceIndex(..., dtype=object)
disable all inference. SoIndex([1, 2], dtype=object)
will give you anIndex
instead ofInt64Index
, andIndex([(1, 'a'), (2, 'b')], dtype=object)
anIndex
instead ofMultiIndex
, etc.(original post follows)
Or how much magic should we have in the Index constructors? Currently we infer the index type from the data, which is often convenient, but sometime difficult to reason able behavior. e.g.
hash_tuples
currently doesn't work if your tuples all happen to be length 1, since it uses a MultiIndex internally.Do we want to make our
Index
constructors more predictable? For reference, here are some examples:Of these, I think the first (
Index -> MultiIndex
if you have tuples) and the last (MultiIndex -> Index
if you're tuples are all length 1) are undesirable. TheIndex -> MultiIndex
one has thetupleize_cols
keyword to control this behavior. In #17236 I add an analogous keyword to the MI constructor. The rest are probably fine, but I don't have any real reason for saying that[1, 2, 3]
magically returning an Int64Index is ok, but[(1, 2), (3, 4)]
returning aMI
isn't (maybe the difference between a MI and Index is larger than the difference between an Int64Index and Index?). I believe that in either theRangeIndex
orIntervalIndex
someone (@shoyer?) had objections to overloading theIndex
constructor to return the specialized type.So, what should we do about these? Leave them as is? Deprecate the type inference? My vote is for merging #17236 and leaving everything else as is. To me, it's not worth breaking API over.
cc @jreback, @jorisvandenbossche, @shoyer
The text was updated successfully, but these errors were encountered: