-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid factorization in MultiIndex.to_pandas #15150
Avoid factorization in MultiIndex.to_pandas #15150
Conversation
python/cudf/cudf/core/multiindex.py
Outdated
levels=[ | ||
level.to_pandas(nullable=nullable) for level in self.levels | ||
], | ||
# np.iinfo.min used as missing code, but pandas uses -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I follow what this comment means.
Is the np.iinfo.min
coming from libcudf? (More generally, where is that value coming from?)
Should this be a "replace value" rather than a "clip" call?
Should we prefer to call something from libcudf instead of cupy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's coming from libcudf indirectly because it's being used in MultiIndex.__init__
:
cudf/python/cudf/cudf/core/multiindex.py
Line 192 in e03623a
code[code == -1] = np.iinfo(size_type_dtype).min |
Should this be a "replace value" rather than a "clip" call? Should we prefer to call something from libcudf instead of cupy?
Good point. A "replace value" would be more suitable here. I was kinda blindly following how MultiIndex.codes
generates cupy arrays and working off that
…ke/cudf into ref/to_pandas_multiindex
Mind re-reviewing when you have a chance @bdice? |
/merge |
Description
This also uncovered a bug in
DataFrame.rename
where the underlyingMultiIndex
ColumnAccessor
was not being replacedChecklist