Make Column.to_pandas return Index instead of Series #15833

mroeschke · 2024-05-23T01:47:51Z

Description

Column.to_pandas backs Index.to_pandas/Series.to_pandas/DataFrame.to_pandas and returned a pandas.Series; however, the index of this pandas.Series was not strictly necessary for Index.to_pandas and DataFrame.to_pandas.

Additionally, pandas.Index is 1D-like like Column and provides a better mental model to to_pandas conversion.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

vyasr

Does it really make sense for column objects to convert to pandas objects at all? I think we still need some sort of method at the column level so that columns of different dtypes can overload it to produce different results (e.g. CategoricalColumn needs to convert the categories as well as the labels), but is there a lower-level representation that we could change these methods to return instead so that we could delegate the actual pandas object construction to the Frame level? I'm imagining, in cudf.Series for example, something like return pd.Series(self._column.to_pandas_data()) instead of return self._column.to_pandas(). Does that make sense?

mroeschke · 2024-06-03T21:23:51Z

is there a lower-level representation that we could change these methods to return instead so that we could delegate the actual pandas object construction to the Frame level?

I'm fairly skeptical that there is a common lower-level representation. Generally Column.to_arrow is called first, then each column dtype manipulates that as needed (from just calling to_pandas to needing a Python list of objects).

I agree it would be nice to have the pandas construction happen at the Frame level, but I think Frame would have to inherit the column dtype specific logic depending on what self._column.to_pandas_data() returns which isn't great.

vyasr

Fair enough. I'd file this in the back of your head in the list of "things we should consider when rewriting cudf internals to use pylibcudf". I suspect that we'll want to think hard about problems like this when reenvisioning what the column layer of cudf ought to look like.

mroeschke · 2024-06-03T21:53:36Z

Thanks. Yeah I think for this use case converting a pylibcudf column to something arrow-like is probably the correct "common data" for the to_pandas API in the future, especially if pandas gains pyarrow as a required dependency

mroeschke · 2024-06-04T18:56:21Z

/merge

Make Column.to_pandas return Index instead of Series

ab4d298

mroeschke added Python Affects Python cuDF API. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels May 23, 2024

mroeschke requested a review from a team as a code owner May 23, 2024 01:47

mroeschke requested review from bdice and brandon-b-miller May 23, 2024 01:47

Merge branch 'branch-24.08' into ref/column_to_pandas/index

64dd8fa

vyasr reviewed May 28, 2024

View reviewed changes

Merge branch 'branch-24.08' into ref/column_to_pandas/index

a6b0d74

vyasr approved these changes Jun 3, 2024

View reviewed changes

rapids-bot bot merged commit fe74129 into rapidsai:branch-24.08 Jun 4, 2024
73 checks passed

mroeschke deleted the ref/column_to_pandas/index branch June 4, 2024 18:56

vyasr mentioned this pull request Jun 7, 2024

[BUG]: test_intersections tests failing on 24.08 rapidsai/cuspatial#1395

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Column.to_pandas return Index instead of Series #15833

Make Column.to_pandas return Index instead of Series #15833

mroeschke commented May 23, 2024

vyasr left a comment

mroeschke commented Jun 3, 2024

vyasr left a comment

mroeschke commented Jun 3, 2024

mroeschke commented Jun 4, 2024

Make Column.to_pandas return Index instead of Series #15833

Make Column.to_pandas return Index instead of Series #15833

Conversation

mroeschke commented May 23, 2024

Description

Checklist

vyasr left a comment

Choose a reason for hiding this comment

mroeschke commented Jun 3, 2024

vyasr left a comment

Choose a reason for hiding this comment

mroeschke commented Jun 3, 2024

mroeschke commented Jun 4, 2024