Support more numeric types in `Groupby.apply` with `engine='jit'` #13729

brandon-b-miller · 2023-07-21T16:16:23Z

draft

This PR adds additional numeric dtypes to GroupBy.apply with engine='jit'.

brandon-b-miller · 2023-07-24T14:01:01Z

I ran into some issues compiling the shim library with int16 and int8 dtypes here. It looks like our implementations of the underlying block level reduction functions rely on libcu++ atomic_ref which per the documentation may only be instantiated with a T that are either 4 or 8 bytes. This seems consistent with the error I'm getting:

atomic_cuda.h(111): error: static assertion failed with "atomic_ref does not support 1 or 2 byte types

This means that to support shorter ints we will need to do a little more involved development on the c++ side that I think should be separated out into another PR. I'm updating the PR title to reflect that this PR itself only goes as far as rounding us out with int32 and float32.

brandon-b-miller · 2023-07-24T14:06:27Z

raised #13736

bdice

I'm not a huge fan of repeating the typecasting logic twice in Python and once in C++... we'll need to find a better architecture for this before we get too deep into adding more types.

bdice · 2023-07-24T15:02:49Z

python/cudf/cudf/core/udf/groupby_typing.py

    _register_cuda_idx_reduction_caller("IdxMax", ty)
    _register_cuda_idx_reduction_caller("IdxMin", ty)

+_register_cuda_reduction_caller("Sum", types.int32, types.int64)


It feels like the typecasting rules were hard-coded twice. Once here, and once in GroupSum. Is there a way to reuse a single mapping / type promotion function?

Yeah, this is a good point. There should only be one type mapping. Let me see what I can do here.

In the latest commit, I've refactored towards an attribute generating function which checks the existing registry of cuda functions for a match in order to return a signature. I think that makes it so that this is the only place in python where the signatures are written by hand. Let me know what you think.

brandon-b-miller · 2023-07-24T18:57:54Z

@bdice thanks for your review. I've thought a bit about making it so that there's only one centralized place to keep all of the function signatures that could be read directly from c++ or python. Since the shim function library is built before the python is imported, I suppose it should be the source of truth as well. This leads my mind towards some kind of solution which inspects the shim file at import and generates signatures dynamically from it. @gmarkall can cuobjdump or nvdisasm help us here?

bdice

Much better to only define these once in Python. I'm okay with redefining in C++ and Python if that is needed. It is probably difficult to do this dynamically. I have one question, otherwise approving.

python/cudf/cudf/core/udf/groupby_typing.py

brandon-b-miller · 2023-07-25T18:28:38Z

/merge

add int32 tests and add edge cases in typing

2b95722

github-actions bot added the Python Affects Python cuDF API. label Jul 21, 2023

brandon-b-miller added feature request New feature or request non-breaking Non-breaking change labels Jul 21, 2023

add float32 and refactor

741a836

brandon-b-miller mentioned this pull request Jul 24, 2023

[FEA] Support <4 byte types in GroupBy.apply with engine='jit' #13736

Open

brandon-b-miller marked this pull request as ready for review July 24, 2023 14:06

brandon-b-miller requested a review from a team as a code owner July 24, 2023 14:06

brandon-b-miller requested review from bdice and isVoid and removed request for a team July 24, 2023 14:06

add some docs to sum

8cd6886

brandon-b-miller added the 3 - Ready for Review Ready for review by team label Jul 24, 2023

bdice reviewed Jul 24, 2023

View reviewed changes

look up existing signatures when defining attribute typing

1056d4c

bdice approved these changes Jul 24, 2023

View reviewed changes

python/cudf/cudf/core/udf/groupby_typing.py Show resolved Hide resolved

explicitly return None

ce452ec

rapids-bot bot merged commit 43aca00 into rapidsai:branch-23.10 Jul 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support more numeric types in `Groupby.apply` with `engine='jit'` #13729

Support more numeric types in `Groupby.apply` with `engine='jit'` #13729

brandon-b-miller commented Jul 21, 2023

brandon-b-miller commented Jul 24, 2023

brandon-b-miller commented Jul 24, 2023

bdice left a comment

bdice Jul 24, 2023

brandon-b-miller Jul 24, 2023

brandon-b-miller Jul 24, 2023

brandon-b-miller commented Jul 24, 2023

bdice left a comment

brandon-b-miller commented Jul 25, 2023

Support more numeric types in Groupby.apply with engine='jit' #13729

Support more numeric types in Groupby.apply with engine='jit' #13729

Conversation

brandon-b-miller commented Jul 21, 2023

brandon-b-miller commented Jul 24, 2023

brandon-b-miller commented Jul 24, 2023

bdice left a comment

Choose a reason for hiding this comment

bdice Jul 24, 2023

Choose a reason for hiding this comment

brandon-b-miller Jul 24, 2023

Choose a reason for hiding this comment

brandon-b-miller Jul 24, 2023

Choose a reason for hiding this comment

brandon-b-miller commented Jul 24, 2023

bdice left a comment

Choose a reason for hiding this comment

brandon-b-miller commented Jul 25, 2023

Support more numeric types in `Groupby.apply` with `engine='jit'` #13729

Support more numeric types in `Groupby.apply` with `engine='jit'` #13729