Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add numba engine for rolling apply #30151
ENH: Add numba engine for rolling apply #30151
Changes from 52 commits
3b9bff8
9a302bf
0e9a600
36a77ed
dbb2a9b
f0e9a4d
1250aee
4e7fd1a
cb976cf
45420bb
17851cf
20767ca
9619f8d
66fa69c
b8908ea
135f2ad
34a5687
6da8199
123f77e
54e74d1
04d3530
4bbf587
f849bc7
0c30e48
c4c952e
8645976
987c916
b775684
2e04e60
9b20ff5
0c14033
c7106dc
1640085
2846faf
5a645c0
6bac000
6f1c73f
a890337
0a9071c
9d8d40b
84c3491
a429206
5826ad9
cf7571b
4bc9787
18eed60
f715b55
6a765bf
af3fe50
eb7b5e1
f7dfcf4
a42a960
d019830
29d145f
248149c
a3da51e
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"the data set is larger" here is pretty vague. is the perf gain a function of the array size or more about the user-defined function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhat both, but more obvious with the data size. I can make this more specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add an X for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a blocker here since this is large enough, but would be nice to annotate this in a follow up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a doc-string that says what this function does (the parameters are already documented elsewhere, maybe just mention that)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stuartarchibald sorry for the ping, but I see that
generated_jit
has been deprecated in numba 0.57. IIRC you helped me add this a while back and am lost on how to write this in terms ofoverload
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mroeschke no problem, I can try and help with this. I think it needs to look a bit like this (for reference, this is untested, I am just guessing from the context! Also, the
pandas
variant is obviously wrapped to close over some configuration which I've omitted, so consider this as the function body ofmake_rolling_apply
. I've left comments inline to try and explain what's going on):@overload
is basically saying to Numba "when you see this specific python function (the one in the first argument in the@overload
decorator) use this implementation". The concept about there being a "typing" part that can be used to dispatch different variants based on type is exactly the same as in@generated_jit
. The largest difference is what happens if the JIT compiler is turned off. In the case of@overload
the python function being overloaded will run, i.e., the code just executes as would be expected in the interpreter. Whereas in the case of@generated_jit
, because the pure python implementation and the Numba implementation are the same function, if you turn the JIT compiler off it will just break (the value returned when calling a@generated_jit
function is a function implementing the Numba specialisation). Essentially,@generated_jit
is like doing@overload
but the function being decorated is also the function being overloaded.Hope this helps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reply! We had a PR recently that refactored this to use
extending.register_jittable
. Would that be a sufficient alternative? https://github.com/pandas-dev/pandas/pull/53455/filesThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem! I just took a look at the patch above, I think it'd work but think it might lose some of the dispatch ability offered by
generated_jit
/overload
. As I understand it, the original code would have let a NumPy function or a built-in be passed in as the "user function", whereas I think theregister_jittable
version requires a user defined Python function. It may be that theregister_jittable
version is a sufficient alternative for the need/use cases in practice, in which case, it seems appropriate.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great thanks for the context! Yeah this function should expect a custom UDF so thanks for the confirmation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad to get this resolved, thanks for confirming too! It sounds like the replacement above is appropriate. If there are any more issues/queries feel free to open issues on the Numba issue tracker (or ping here!).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stuartarchibald I'm running into a rolling apply issue with pandas 2.1.1 and numba 0.58 that might be related. Discussion is here:
https://numba.discourse.group/t/pandas-source-of-old-style-error-capturing-warning/2169/8