-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove cudf._lib.interop in favor of inlining pylibcudf #17555
Remove cudf._lib.interop in favor of inlining pylibcudf #17555
Conversation
result = cls.from_pylibcudf(plc.interop.from_arrow(array)) | ||
# TODO: cudf_dtype_from_pa_type may be less necessary for some types | ||
return result._with_type_metadata( | ||
cudf_dtype_from_pa_type(array.type) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the kind of thing I was thinking of in the other PR. Having a standardized entrypoint of some sort (maybe per-class?) into pylibcudf from cudf Python would help us collect common functionality like _with_type_metadata
that we otherwise add piecemeal as we find bugs and incompatibilities with pandas.
if isinstance(array, pa.ChunkedArray): | ||
array = array.combine_chunks() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we always have to combine chunks? IIRC the existing implementation works without combining in most cases, and I don't think combining is free performance-wise so we should avoid it if we can. I could be wrong though, or misremembering an earlier state of the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right. Yeah this will make a copy on the CPU side.
I see now in libcudf side we only support returning tables (and not columns) from an arrow stream. I was hoping to avoid the dance of putting the chunked array in a pyarrow table but I think the dance is worth avoiding a CPU copy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks yeah I think this is the right call for now. We could generalize the libcudf APIs in the future if that helps.
/merge |
Description
Contributes to #17317
Checklist