-
Notifications
You must be signed in to change notification settings - Fork 920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove internal usage of core.index.as_index in favor of cudf.Index #15851
Remove internal usage of core.index.as_index in favor of cudf.Index #15851
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a time when all Index classes implemented __new__
instead of __init__
as a way to allow the parent Index class to construct indexes of the appropriate type. A lot of the class hierarchy was reworked to simplify the constructor logic and ensure that you could still return the right types. pandas 2.0 simplified this pretty substantially by getting rid of all the subclasses of Index corresponding to numerical and string types. Is it still possible for pd.Index(...)
to return an object of a type other than Index
itself? IIRC we now have to explicitly construct CategoricalIndex, MultiIndex, and RangeIndex, right? The only caveat is the tupleize_cols
argument, which throws a bit of a wrench in things; is that planned to remain long-term in pandas?
Basically what I'm wondering is whether we can go even further than this PR currently does (probably in a follow up) to where we remove as_index
altogether (or at least make it an internal only helper function used in specific cases; right now I'm sure it gets used externally by projects like cuspatial and cuml) and also reduce the complexity of the index hierarchy without needing to implement __new__
, which adds confusion I'd like to avoid if possible.
Yeah the In [1]: import pandas as pd
In [2]: import datetime
In [3]: pd.Index([datetime.datetime.now()])
Out[3]: DatetimeIndex(['2024-06-03 15:29:10.717813'], dtype='datetime64[us]', freq=None)
Yeah I imagine so in order to give users a way to construct a
I think once we get other RAPIDS libaries to stop using |
It would be nice if we could move towards a world where the Index constructor only returned instances of itself. In the meantime, maybe the best that we can do is subclass index with an intermediate class that overrides |
/merge |
Description
cudf.Index.__init__
essentially callsas_index
immediately internally. To avoid both from potentially diverging, the publiccudf.Index
should be preferred to ensure the public behaviors are used internallyChecklist