Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove cudf._lib.interop in favor of inlining pylibcudf #17555

Merged
merged 8 commits into from
Dec 17, 2024

Conversation

mroeschke
Copy link
Contributor

Description

Contributes to #17317

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@mroeschke mroeschke added Python Affects Python cuDF API. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Dec 9, 2024
@mroeschke mroeschke self-assigned this Dec 9, 2024
@mroeschke mroeschke requested a review from a team as a code owner December 9, 2024 22:20
@mroeschke mroeschke requested review from wence- and vyasr December 9, 2024 22:20
@github-actions github-actions bot added the CMake CMake build issue label Dec 9, 2024
Comment on lines 360 to 364
result = cls.from_pylibcudf(plc.interop.from_arrow(array))
# TODO: cudf_dtype_from_pa_type may be less necessary for some types
return result._with_type_metadata(
cudf_dtype_from_pa_type(array.type)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the kind of thing I was thinking of in the other PR. Having a standardized entrypoint of some sort (maybe per-class?) into pylibcudf from cudf Python would help us collect common functionality like _with_type_metadata that we otherwise add piecemeal as we find bugs and incompatibilities with pandas.

Comment on lines 335 to 336
if isinstance(array, pa.ChunkedArray):
array = array.combine_chunks()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we always have to combine chunks? IIRC the existing implementation works without combining in most cases, and I don't think combining is free performance-wise so we should avoid it if we can. I could be wrong though, or misremembering an earlier state of the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right. Yeah this will make a copy on the CPU side.

I see now in libcudf side we only support returning tables (and not columns) from an arrow stream. I was hoping to avoid the dance of putting the chunked array in a pyarrow table but I think the dance is worth avoiding a CPU copy

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks yeah I think this is the right call for now. We could generalize the libcudf APIs in the future if that helps.

@mroeschke
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit b9760ac into rapidsai:branch-25.02 Dec 17, 2024
105 checks passed
@mroeschke mroeschke deleted the cudf/_lib/interop branch December 17, 2024 23:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake CMake build issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants