-
-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build packages compatible with multiple NumPy version #32
Conversation
NumPy maintains backwards compatibility and therefore extensions built with a older version of NumPy will work with newer versions. To take advantage of this and build a single pandas package that will work with multiple version of NumPy, the oldest version of NumPy provided for a particular Python version (Python 3.5 - 1.9 Python 3.6 - 1.11) or in the case of Python 2.7 the minimum version required by Pandas (1.7) is pinned as a build requirement. The run requirement is then specified as any NumPy version greater than or equal to this pinned version.
Hi! This is the friendly automated conda-forge-linting service. I just wanted to let you know that I linted all conda-recipes in your PR ( |
Stack Overflow has a nice writeup on NumPy's backwards compatibility This same build with an older NumPy trick is being used to build whl's for Python projects that make use of the NumPy C API. |
That is interesting. Should we adopt this policy everywhere? |
If it works, yes! More testing and a chat with some of the NumPy team at SciPy would probably help clarify a few things about the process. |
Should point out that besides cutting down on the number of packages that need to be built, this technique also means that the release of a new NumPy version does not require new builds of the various downstream packages which has been a pain point in the past. |
I guess that, with this policy, it would be nice if we could run the whole test suite to help catch any accidental breakage. |
I'll be at SciPy next week. Would you recommend waiting until then to merge this? Another question that isn't relevant yet. If pandas decides to require a newer NumPy would we need to update the build matrix? |
Nope. Only the recipe. That is the beauty of this approach. |
I tested this numpy technique with With these tests and having talked with a number of folks at SciPy about the approach, I feel confident that this approach is safe for future packaging. |
- numpy x.x | ||
- numpy >=1.7 # [py27] | ||
- numpy >=1.9 # [py35] | ||
- numpy >=1.11 # [py36] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why have different NumPy versions for different Python versions? What would be wrong with requiring numpy >=1.7
for all Python versions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably because numpy
1.7 does not support py35 or py36
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, though admittedly there won't be a numpy
1.7 build for python
3.5 or 3.6 then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if there ever was a numpy 1.7 for python 3.5 or 3.6 the package built using this recipe would NOT work with them because numpy does not guarantee forward compatibility.
These lines properly describe the numpy requirement for the various Python versions. I don't think depending on the repository not having certain versions of a package is something we want to depend upon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get the impression by looking at the release notes that not much has changed in the C API since NumPy 1.7 or at least what has changed would not present an issue when building. Though there may be issues at runtime for exceptionally old code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing that does actually concern me a bit with this strategy is that if we have BLAS dependent packages that use NumPy/SciPy as interface to the BLAS, reaching back to such old versions will give us packages that are not compatible with how BLAS is used in conda-forge currently and thus mostly like the wrong BLAS. Given this, would it make more sense to restrict everything to say NumPy 1.10 or 1.11 when NumPy was added to conda-forge with BLAS support?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to restrict everything to 1.10
or 1.11
. We've dropped numpy <= 1.10 in the old approach anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Building against a "newish" numpy does not create a package which works with older numpy versions. It is necessary to specify the run requirement numpy be strictly greater than or equal to the build version. For example, installing the pandas from defaults
built against numpy 1.12 and then force installing numpy 1.9 results in the following import time error:
$ python -c "import pandas"
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/Users/jhelmus/anaconda/envs/test/lib/python2.7/site-packages/pandas/__init__.py", line 26, in <module>
from pandas._libs import (hashtable as _hashtable,
File "/Users/jhelmus/anaconda/envs/test/lib/python2.7/site-packages/pandas/_libs/__init__.py", line 3, in <module>
from .tslib import iNaT, NaT, Timestamp, Timedelta, OutOfBoundsDatetime
File "pandas/_libs/src/numpy.pxd", line 157, in init pandas._libs.tslib (pandas/_libs/tslib.c:117317)
ValueError: numpy.dtype has the wrong size, try recompiling. Expected 88, got 96
Although conda-forge does not provide numpy version less than 1.10 they are available in defaults
and I believe should be supported. I am willing to submit PRs to get conda-forge
version of these older numpy version if that is desired.
As for the BLAS issue, any packages which links directly to BLAS/LAPACK should be pinning their BLAS implementation and versions. I suspect there are packages in the channel which do not do this correctly, these should be updated and the un-pinned packages removed. This BLAS pinning limits the the supported versions of numpy available to the package via the solver. I do not think this limitation should be introduced by overloading the numpy package to include a BLAS version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Building against a "newish" numpy does not create a package which works with older numpy versions. It is necessary to specify the run requirement numpy be strictly greater than or equal to the build version.
I understand. However this is not I was proposing either.
Am proposing we pick 1.10 or 1.11 and build with that version and make it the lower bound as well across all Pythons. It will be the easiest to do for core, for reviewers, for maintainers and the most consistent for a variety of reasons already outlined. I don't really see any point to going to any more work than that as I would expect nearly all conda-forge users are on NumPy 1.10 or greater at this point.
Although conda-forge does not provide numpy version less than 1.10 they are available in
defaults
and I believe should be supported. I am willing to submit PRs to getconda-forge
version of these older numpy version if that is desired.
That said, if you are willing to do the work to make this happen, no real opposition from me. Just seems like the average maintainer already has enough to keep in our heads (e.g. building on Windows using python
as a proxy for VS, pinnings, etc.), it would be nice to not give them something new NumPy strategy that is more work than what they already do. Though again this is a concern IMHO not a blocker.
This BLAS pinning limits the the supported versions of numpy available to the package via the solver. I do not think this limitation should be introduced by overloading the numpy package to include a BLAS version.
I think the point is missed here. If someone pins BLAS as we tell them to do and pins NumPy as we tell them to do and these conflict, the package won't build. I'm almost certain we will hear grumbles about this. Hence not having old enough NumPy's built with the currently BLAS schema (and pinning) does present a blocker for the strategy outlined in this PR. How the knobs of NumPy version and BLAS version(s) are tuned will make this easier or harder to maintain in the long run. Adding Python in with these knobs complicates things. So I would urge that we all reach some agreement on where everyone feels comfortable tuning such knobs and how variable they can be.
At this point, we probably should roll this over into a webpage issue so that this doesn't get buried as it will be some sort of policy decision.
If these recipes are to be re-used by others it would be nice to have the
oldest possible version.
|
Build using a older version of NumPy to create packages which are compatible with multiple NumPy versions.