-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Column.to_pandas return Index instead of Series #15833
Make Column.to_pandas return Index instead of Series #15833
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it really make sense for column objects to convert to pandas objects at all? I think we still need some sort of method at the column level so that columns of different dtypes can overload it to produce different results (e.g. CategoricalColumn needs to convert the categories as well as the labels), but is there a lower-level representation that we could change these methods to return instead so that we could delegate the actual pandas object construction to the Frame level? I'm imagining, in cudf.Series for example, something like return pd.Series(self._column.to_pandas_data())
instead of return self._column.to_pandas()
. Does that make sense?
I'm fairly skeptical that there is a common lower-level representation. Generally I agree it would be nice to have the pandas construction happen at the Frame level, but I think Frame would have to inherit the column dtype specific logic depending on what |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. I'd file this in the back of your head in the list of "things we should consider when rewriting cudf internals to use pylibcudf". I suspect that we'll want to think hard about problems like this when reenvisioning what the column layer of cudf ought to look like.
Thanks. Yeah I think for this use case converting a pylibcudf column to something arrow-like is probably the correct "common data" for the |
/merge |
Description
Column.to_pandas backs
Index.to_pandas
/Series.to_pandas
/DataFrame.to_pandas
and returned apandas.Series
; however, theindex
of thispandas.Series
was not strictly necessary forIndex.to_pandas
andDataFrame.to_pandas
.Additionally,
pandas.Index
is 1D-like likeColumn
and provides a better mental model toto_pandas
conversion.Checklist