-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Expand support for cupy universal functions (ufuncs) #9083
Comments
Yes, I think we want to unify our approach to
I say best case because I think |
This issue has been labeled |
Following up on this -- we should update the dispatch so that we do not define all the ufuncs ( See also this list of available ufuncs in NumPy: https://numpy.org/doc/stable/reference/ufuncs.html#available-ufuncs See also: #9815 (comment) |
This PR is a first step in addressing #9083. It rewrites Series ufunc dispatch to use a simpler dispatch pattern that removes lookup in the overall cudf namespace, restricting only to Series methods or cupy functions. The changes in this PR also enable proper support for indexing so that ufuncs between two Series objects will work correctly and preserve indexes. Additionally, ufuncs will now support Series that contain nulls, which was not previously the case. Series methods that don't exist in `pandas.Series` that were previously only necessary to support ufunc dispatch have now been removed. Once this PR is merged, I will work on generalizing this approach to also support DataFrame objects, which should enable full support for ufuncs as well as allowing us to deprecate significant chunks of existing code. For the cases that previously worked, this PR does have some performance implications. Any operator that dispatches to cupy is about 3x faster now (assuming no nulls, since the nullable case was not previously supported), with the exception of logical operators for which we previously defined functions in the Series namespace that do not have pandas analogs. I've made a note in the code that we could reintroduce internal versions of these just for ufunc dispatch if that slowdown becomes a bottleneck for users, but for now I would prefer to avoid any more special cases than we really need. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Christopher Harris (https://github.com/cwharris) - Charles Blackmon-Luca (https://github.com/charlesbluca) URL: #10217
This PR addresses the primary issue in #9083, enabling all numpy ufuncs for DataFrame objects. It builds on the work in #10217, generalizing that code path to support multiple columns and moving the method up to `IndexedFrame` to share the logic with `DataFrame`. The custom preprocessing of inputs before handing off to cupy that was implemented in #10217 has been replaced by reusing parts of the existing binop machinery for greater generality, which is especially important for DataFrame binops since they support a wider range of alternative operand types. The current internal refactor is intentionally minimal to leave the focus on the new ufunc features. I will make a follow-up to clean up the internal functions by adding a proper set of hooks into the binop and ufunc implementations so that we can share these implementations with Index types as well, at which point we will be able to remove the extraneous APIs discussed in #9083 (comment). Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Ashwin Srinath (https://github.com/shwina) URL: #10287
This PR builds on #10217 and #10287 to bring full ufunc support for Index types, expanding well beyond the small set previously supported in the `cudf.core.ops` namespace. By using most of the machinery introduced for IndexedFrame in the prior two PRs we avoid duplicating much logic so that all ufunc dispatches flow through a relatively standard path of known methods prior to a common cupy dispatch. With this change we are also able to deprecate the various ufunc operations defined in cudf/core/ops.py that exist only for this purpose as well as a number of Frame methods that are not defined for the corresponding pandas types. Users of those APIs are recommended to calling the corresponding numpy/cupy ufuncs instead to leverage the new dispatch. This PR also fixes a bug where index binary operations that output booleans would previously return instances of GenericIndex, whereas those pandas operations would return numpy arrays. cudf now returns cupy arrays in those cases. Resolves #9083. Contributes to #9038. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #10346
Using numpy/cupy universal functions (ufuncs) directly on dataframes is a common usage pattern that we partially support today. #256 requested support for a comprehensive set of ufuncs. Many of these ufuncs are now supported on Series through a codepath that hits:
cudf/python/cudf/cudf/utils/utils.py
Lines 331 to 334 in 417b34d
For DataFrames, we dispatch only a small subset of functions enumerated in https://github.com/rapidsai/cudf/blob/be25a30ca20f3135f341c51b36cb075b376d5def/python/cudf/cudf/core/ops.py
It would be nice to expand our support for ufuncs on DataFrames (and Series as appropriate). As @vyasr noted offline, we may want to explore our approach to this dispatch as well.
When we don't support a ufunc, we get the following with direct usage of the cupy ufunc:
If we rely on the array function protocol to dispatch first from numpy to cupy, we get the following:
The text was updated successfully, but these errors were encountered: