Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid factorization in MultiIndex.to_pandas #15150

Merged

Conversation

mroeschke
Copy link
Contributor

Description

This also uncovered a bug in DataFrame.rename where the underlying MultiIndex ColumnAccessor was not being replaced

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@mroeschke mroeschke added bug Something isn't working Python Affects Python cuDF API. non-breaking Non-breaking change labels Feb 27, 2024
@mroeschke mroeschke requested a review from a team as a code owner February 27, 2024 00:53
@mroeschke mroeschke requested review from shwina and bdice February 27, 2024 00:53
levels=[
level.to_pandas(nullable=nullable) for level in self.levels
],
# np.iinfo.min used as missing code, but pandas uses -1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I follow what this comment means.

Is the np.iinfo.min coming from libcudf? (More generally, where is that value coming from?)

Should this be a "replace value" rather than a "clip" call?

Should we prefer to call something from libcudf instead of cupy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's coming from libcudf indirectly because it's being used in MultiIndex.__init__:

code[code == -1] = np.iinfo(size_type_dtype).min

Should this be a "replace value" rather than a "clip" call? Should we prefer to call something from libcudf instead of cupy?

Good point. A "replace value" would be more suitable here. I was kinda blindly following how MultiIndex.codes generates cupy arrays and working off that

@mroeschke
Copy link
Contributor Author

Mind re-reviewing when you have a chance @bdice?

@mroeschke
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 9678c90 into rapidsai:branch-24.04 Mar 6, 2024
74 checks passed
@mroeschke mroeschke deleted the ref/to_pandas_multiindex branch March 6, 2024 22:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants