-
Notifications
You must be signed in to change notification settings - Fork 920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Support UDF Runtime compilation for incoming PTX with non-inlineable callees #8470
Comments
CC @hummingtree |
@brandon-b-miller This is expected. The same reason applies to why this workflow does not apply to, say, To make this possible, one needs to have a more comprehensive PTX to CUDA parser than the current one. In the past I did look at this, but to have a parser this fancy, one probably needs to use abstract syntax tree, which is more than what I could do in the short term. |
@hummingtree thank you for clarifying- that makes sense. It looks like the changes to make this work are nontrivial. Still, not being able to support |
So if from numba, we get two functions, one of which calls the other, then we need to first recognize there are two PTX functions, and convert those two PTX functions to CUDA functions. Then we need to do one of the two:
|
Ok - I will try and POC something and report back here. |
Replaces C++ implementation of masked UDF pipeline with a pure python version which compiles and launches the entire kernel using numba. This solves a bunch of problems: - CUDA 11.0 support is now available since the impl no longer needs `cuda::std::tuple` to work with NVRTC 11.0. - Support for special functions which compile to multiple function definitions, such as `pow`, `sin`, and `cos` is now provided since all the PTX is compiled and linked inside numba (Fixes #8470) - Allows us to support this corner case which would require a separate c++ kernel in previous implementation ```python def f(x): return 42 ``` - Makes developing/adding features to the impl much easier Authors: - https://github.com/brandon-b-miller Approvers: - Robert Maynard (https://github.com/robertmaynard) - GALI PREM SAGAR (https://github.com/galipremsagar) - Graham Markall (https://github.com/gmarkall) - Ashwin Srinath (https://github.com/shwina) URL: #9174
Describe the bug
The current udf compilation pipeline, used under
Series.applymap
, creates a generic PTX function to be inlined into a kernel in libcudf and finally compiled and launched using jitify. The PTX string is processed during an intermediate step into a CUDA C++ function by the libcudf parser. The problem seems to be that this workflow relies on there being exactly one function in the PTX string marked as.func
. This is not always the case however.Consider the following function:
The PTX string that we get from numba is written in terms of a main function and a second function that the main function calls:
This results in the parser returning a malformed CUDA function, at least due to it picking up the wrong signature as well as probably some other issues. In general it looks like the pipeline doesn't support cases where the incoming PTX string is written as several separate functions.
Steps/Code to reproduce bug
This is a problem as users can't always use
pow
in their UDFs as well as possibly a bunch of other trig functions and anything else numba will probably generate multiple PTX functions for.Expected behavior
Environment overview (please complete the following information)
The text was updated successfully, but these errors were encountered: