Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement groupby apply with JIT #11452
Implement groupby apply with JIT #11452
Changes from 96 commits
8db918f
2d6b4c9
fd8680e
9220658
f4bc7c4
8659149
b7ede43
1dbbb77
f98fc63
11edd37
1e12416
d348fb8
795e580
0ce0a90
3493d49
7f9ea1f
3d76a44
ad878ac
d876ad7
f600196
6cbdaf8
3a11fe1
8161548
52656ab
b9096f3
d21a099
62aad1e
9ff058a
5f07ca2
38c3560
c12a9e3
e650c21
0fd6e22
aabdc5f
301eea1
e50f4a6
df1485d
353078c
14fe3cb
f7791b4
7f63c90
902223a
78f8b6f
611b864
2849680
33109f5
07444eb
39eb8f9
f30cd8b
6158cb7
93df707
5ae896a
a42d307
0e0b750
e891e5f
8188508
95fa402
3de3add
865bb5d
7788944
bdea84c
0110075
b039ce7
a8c3a75
21792c6
14dc674
3a48a96
9766233
99af3f2
321fdab
c91a589
aa47763
cbc13e6
ab20731
f721f9c
a24f09e
d908621
595746a
9a93af7
9d3c431
bb8b7c3
1f475f0
0bb68bc
209188f
826ed25
1cf91ea
381dd00
ee87548
1d4edc8
4ff80f4
4fe21fb
3fbe3ff
b59d31e
9af3670
afd0949
c05e889
1828ef7
6489950
9b83d78
73a2ba1
4dfb790
43be944
b5f8f63
8bbd725
2df3216
e8137e3
c5e7445
761261c
9884897
41b42c7
5855f5c
d6a3ef2
9b60a62
f0a9af8
6708655
3e5149d
c253b8f
51d7ec7
bae845d
e91b641
0d35554
73892e1
08cbcb7
81bfeb1
3d76481
97490af
43694f7
928d404
8079047
11c0eb6
3a5afa6
4d719b5
5c5e37c
fac8d70
2f9cc76
961b3b9
5e6d4aa
6665ef9
62b5a99
62a8928
de6b54c
0c3d5a0
b0e8c29
5a5c0fb
5db0b6c
7e2ca13
40b8ce9
bb8e0c3
0952784
e0d0230
81860c5
2f352bc
0b407c8
568ab97
83f8d88
dbd5eeb
eaa8ff7
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do two things?
So that this doesn't disappear into the mists of time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this even a problem anymore? It's a hard requirement now and needs to be enabled to build the python package, so I'd assume it belongs here indefinitely at this point unless this is hacky for some reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused import. (I think flake8 might be disabled on this file... it should have caught that.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds like this link should explain the Numba JIT pipeline, but it doesn't. Will we update that notebook in this PR (or before the 23.02 release)?
If we don't have time to update that notebook, maybe we can explain the choice of 'cudf' or 'jit' a little more in this docstring. Currently I don't think users will know enough to decide which is appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will definitely be a notebook update asap! It's in progress. I'll expound a bit here regardless though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we update this message with a suggestion to try using
engine="jit"
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a note about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We know this is slow because it involves a lot of Python invocations of GPU kernels when there are many groups. However, have we tried using some thread parallelism or other means to reduce/conceal that overhead? Out of scope for this PR but I would like to know that we've made an attempt at improving this code path since we can't always use
"jit"
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've tried streams and a number of other approaches. @shwina led the charge on this a while back I believe. I think the result was that the python overhead was dominating the runtime regardless of being able to parallelize over kernel launches somewhat. @wence- had some more advanced ideas we brainstormed earlier this year for a more holistic
groupby.apply
combining a few elements from across our learnings I believe. Maybe time to restart that conversation. Ultimately though the lack of traction with any previous approach is what led to this PR in the first place.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As Brandon says, the fact that our previous attempts didn't really improve much is the reason we ended up moving towards JIT. @isVoid put together the streams prototype IIRC. To really improve the performance we'll need to reduce Python overhead all the way down the call stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just my stylistic preference, but you may as well assign to the struct members directly rather than storing in intermediates (including replacing
size
withgrp.size
in the tuple unpacking ofargs
, which will require a slight reordering to create the group before that unpacking). IMO the extra redirection only reduces readability in this case.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored this as you suggested, it's definitely better now!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is
Group
capitalized? This seems unusual for a function name. Same for variables below.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was changed by the time this comment landed. Let me know if the new names look good to you. Happy to iterate still :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible for this function to call
lowering_function
, or for both of these to use a common helper function? It would be nice to reduce duplication.This function name should also mirror
lowering_function
so that it's obvious that this is a specialized version of that. Right now the two seem like completely independent functions based on the names.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't manage to come up with a way of fusing these that really reduced the lines of code. I found that the addition of the extra index argument and what's needed to form it makes it hard to excise much common code in a way that seems helpful. Open to suggestions here but leaving as-is for now.