Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use fused types for overloaded function signatures #14969

Merged
merged 3 commits into from
Feb 5, 2024

Conversation

vyasr
Copy link
Contributor

@vyasr vyasr commented Feb 5, 2024

Description

This change makes the pylibcudf API more convenient and a more faithful reproduction of the underlying libcudf APIs that offer overloaded signatures. In cases like binary ops where we were previously using runtime instance checks, this change also removes unnecessary runtime overhead if the calling code is Cython since in those cases the types at the call site are known at compile time.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@vyasr vyasr added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Feb 5, 2024
@vyasr vyasr self-assigned this Feb 5, 2024
@vyasr vyasr requested a review from a team as a code owner February 5, 2024 18:00
@vyasr vyasr requested review from bdice and isVoid February 5, 2024 18:00
@github-actions github-actions bot added the Python Affects Python cuDF API. label Feb 5, 2024
Copy link
Contributor

@shwina shwina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I believe this is the first use of fused types in our Cython

python/cudf/cudf/_lib/pylibcudf/copying.pxd Show resolved Hide resolved
@vyasr
Copy link
Contributor Author

vyasr commented Feb 5, 2024

/merge

@rapids-bot rapids-bot bot merged commit dfc7f25 into rapidsai:branch-24.04 Feb 5, 2024
68 checks passed
@vyasr vyasr deleted the feat/pylibcudf_fused_types branch February 5, 2024 19:46
pylibcudf.Table([col.to_pylibcudf(mode="read") for col in target_columns]),
)
tbl = pylibcudf.copying.scatter(
pylibcudf.Table([col.to_pylibcudf(mode="read") for col in sources])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we pull out the line that defines the table/scalar here and declare a separate variable for it? It is hard to read a three-line expression that just supplies one argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's fair. TBH I'm not too concerned about how legacy Cython code looks right now because once we've completed pylibcudf we'll need to rewrite cudf internals substantially to leverage pylibcudf properly, so all of this will go away.

Copy link
Contributor

@bdice bdice Feb 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would pulling that out into a variable be able to avoid the conditional cdef-ing problem? Why/why not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused, was this question meant for the other thread?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I'm trying to ask a related question here. I realize now it's because this is def and the other is cpdef.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right that's why I was confused, now I see. You're on the right track; it's less about this function being a def or a cpdef, and more about what it's calling. We're calling pylibcudf as a Python API here, so it accepts arbitrary objects as input and we're not cdefing them. In the other places we have to cdef the inputs because we are calling cdef functions (C++ cdef extern functions) that require strongly typed input.

(<Column> lhs).view(),
dereference((<Scalar> rhs).c_obj),
lhs.view(),
dereference(rhs.c_obj),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to evaluate the lhs/rhs objects ahead of time and reduce the dispatching logic here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming that what you're envisioning is something like (pseudocode):

if lhs is column and rhs is column:
    c_lhs = lhs.view()
    c_rhs = rhs.view()
elif lhs is column and rhs is scalar:
    c_lhs = lhs.view()
    c_rhs = dereference(rhs.c_obj)
...
result = move(cpp_binaryop(c_lhs, c_rhs, ...))
...

Is that what you're thinking?

If so, unfortunately no that doesn't work. There are two reasons:

  1. cdefs can't go inside conditionals. That includes conditionals that are compile-time branches based on fused types. Therefore, you'd still need separate cdefs for c_lhs_scalar and c_lhs_column, for example, then you'd assign them inside the conditional block. That's at least as verbose, but one could perhaps argue that it's more DRY. However:
  2. The C++ function call needs to resolve to the correct overload, so you have to pass the appropriately-typed (known at compile-time) objects to calling cpp_binaryop. So no matter what you need to invoke the function separately for each possible overload.

Ultimately the translation from Cython types (fused or not) to C++ types remains a point of friction.

Let me know if that answered your question or if you meant something else.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. That was my expectation but I didn't want to miss out on anything special if fused types would allow this.

cpp_copying.empty_like(
(<Column> input).view(),
else:
source_scalars = _as_vector(source)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, is it possible to reduce this dispatching logic when using fused types?

@vyasr vyasr added the pylibcudf Issues specific to the pylibcudf package label May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change pylibcudf Issues specific to the pylibcudf package Python Affects Python cuDF API.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants