[FEA] Attempt to JIT `GroupBy.apply` functions by default and fall back to iterative algorithm #13103

brandon-b-miller · 2023-04-10T16:18:06Z

With #11452 we introduced a framework for JIT compiling groupby UDFs with numba, along with the GroupBy.apply engine='jit' kwarg. This is an o.k. approach since generally we are alright with introducing things that are a superset of the Pandas API.

Recently we've discussed changing things so that when a user uses GroupBy.apply we try and JIT the UDF first and if it doesn't work, then fall back to the iterative method. This would provide a unified API with less to learn for users and no wondering if the UDF conforms to the restrictions on JIT apply. It also provides an easier internal interface for features that build on top of GroupBy.apply, such as filter. However it introduces JIT overhead to workflows that ultimately won't even use it. This is not ideal, but iterative groupby apply is pretty slow already.

The text was updated successfully, but these errors were encountered:

vyasr · 2023-04-10T17:35:03Z

I'm fine with introducing the extra overhead in the cases that won't JIT to help the cases that do JIT. We could mitigate the issue by introducing a new engine='auto' mode that does this, allowing users to opt into engine='cudf' if they know that they don't have a UDF that will compile successfully and want to avoid the overhead.

brandon-b-miller · 2023-04-10T18:08:44Z

So would engine='auto' be the default?

vyasr · 2023-04-10T18:18:13Z

I think so. If we wanted to err on the side of performance we could default to jit, but that seems likely to break many workflows.

… jittable (#13113) Closes #13103 Authors: - https://github.com/brandon-b-miller - Lawrence Mitchell (https://github.com/wence-) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Lawrence Mitchell (https://github.com/wence-) - Bradley Dice (https://github.com/bdice) URL: #13113

brandon-b-miller added feature request New feature or request numba Numba issue python labels Apr 10, 2023

brandon-b-miller self-assigned this Apr 10, 2023

brandon-b-miller mentioned this issue Apr 11, 2023

Automatically select GroupBy.apply algorithm based on if the UDF is jittable #13113

Merged

rapids-bot bot closed this as completed in #13113 May 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Attempt to JIT `GroupBy.apply` functions by default and fall back to iterative algorithm #13103

[FEA] Attempt to JIT `GroupBy.apply` functions by default and fall back to iterative algorithm #13103

brandon-b-miller commented Apr 10, 2023

vyasr commented Apr 10, 2023

brandon-b-miller commented Apr 10, 2023

vyasr commented Apr 10, 2023

[FEA] Attempt to JIT GroupBy.apply functions by default and fall back to iterative algorithm #13103

[FEA] Attempt to JIT GroupBy.apply functions by default and fall back to iterative algorithm #13103

Comments

brandon-b-miller commented Apr 10, 2023

vyasr commented Apr 10, 2023

brandon-b-miller commented Apr 10, 2023

vyasr commented Apr 10, 2023

[FEA] Attempt to JIT `GroupBy.apply` functions by default and fall back to iterative algorithm #13103

[FEA] Attempt to JIT `GroupBy.apply` functions by default and fall back to iterative algorithm #13103