-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove usage of the NumPy C API #41
Comments
This is correct, NumPy uses too much of the CPython C API to make supporting the limited API feasible any time soon. |
Thanks for confirming @rgommers! RAPIDS would definitely benefit from producing abi3 wheels, and since we don't really deal with host data buffers there's not really much reason for us to need the numpy C API, so I think it's worth a small investment from us to remove numpy as a build dep. |
While it is correct that NumPy will keep using the full API, of course. That isn't a limitation on downstream. The NumPy headers use the limited API for only a few things and Matti added a test for that. Specifically:
Yes, limited API supported isn't tested or used really, but problems shouldn't be confined to macros, so since including the headers works (and is tested) I wouldn't expect further problems. EDIT: I forgot to add that the necessary disabling of the above two bullets was missing from the headers before NumPy 2.0 meaning that you must compile with NumPy 2. But this shouldn't be a limitation in practice. That doesn't mean I don't agree with the sentiment: I think a lot or even all of NumPy C-API use is probably simply unnecessary and fewer dependencies are great. |
Thanks for clarifying that. Yeah, we only need to care about the numpy headers on our end. I'll update this thread based on how much effort I see removing usage of the NumPy C API taking us. |
Here's the ucxx removal. Not sure I got everything right, but I don't think we need a whole lot more than what's in there. |
#225) Currently ucxx makes use of numpy's C API exactly once, in order to expose host buffers to Python. It does so by leveraging the fact that numpy arrays can be constructed directly from a pointer using the C API. However, we can produce essentially an equivalent result by using numpy's Python object on a representation of the same pointer in terms of a custom object that implements the buffer protocol. This adds about 20 lines of Cython code in exchange for dropping numpy as a build dependency entirely, which is a nice maintenance benefit (see also rapidsai/build-planning#41). Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Sebastian Berg (https://github.com/seberg) - Peter Andreas Entschev (https://github.com/pentschev) URL: #225
Given the UCXX piece has been addressed and the remaining work is in cuDF ( rapidsai/cudf#15193 ), should we go ahead and close this out? |
Sure |
There was some NumPy C API usage in cuCIM via pybind11 This has now been removed with PR: rapidsai/cucim#751 |
With rapidsai/cudf#16640 the cudf transitive usage via pyarrow has also been removed. |
RAPIDS currently makes use of the NumPy C API in a handful of places, generally in Cython code. The NumPy C API is generally quite good and has remained stable, making it easy to work with. However, it does introduce additional build and packaging complexity that would be nice to avoid. With minimal changes to RAPIDS code, we should be able to remove numpy as a build dependency entirely, which may simplify our builds and also saves us from needing to rebuild packages at all when numpy 2.0 is released. If we were getting a lot of value out of the C API the calculus might be different, but in practice our usage of it is very minimal and can generally be avoided. I propose that we expend a little bit of development effort to stop relying on the NumPy C API altogether. This will help us on two fronts: 1) we'll more easily be able to support multiple major versions of NumPy (see #38) since we only have to worry about Python compatibility, not C compatibility; and more importantly 2) we won't have to worry about NumPy C APIs when considering if we can use the Python limited API to produce a single package across Python versions (will open a separate issue for that next). The latter is the more important piece here, since as of this writing the numpy C API is not compatible with the Python limited API based on the author's current experimentation.
The changes required basically boil down to two things:
The text was updated successfully, but these errors were encountered: