Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support distro specific wheels on Linux, *BSD, etc #69

Open
ncoghlan opened this issue Aug 6, 2015 · 20 comments
Open

Support distro specific wheels on Linux, *BSD, etc #69

ncoghlan opened this issue Aug 6, 2015 · 20 comments

Comments

@ncoghlan
Copy link
Member

ncoghlan commented Aug 6, 2015

The compatibility tags defined in PEP 425 really only work for Windows and Mac OS X, where the python.org binaries and the versions of the OS they support provide a clear guide as to the versions extension modules should also support.

This approach doesn't work for all the other *nix releases, where python.org only publishes a source tarball, rather than prebuilt binaries.

The currently generated wheel filenames and the "find which wheel to download" algorithm don't work for these cases, which is why PyPI wheel uploads are currently limited to Windows and Mac OS X, as well as the cross-distro manylinux ABI tags.

(I didn't see an existing overarching issue for this problem, so I filed a new one. There are links to some relevant distutils-sig threads which should be added...)

@ncoghlan ncoghlan changed the title Current wheel names aren't adequate for Linux, *BSD, etc Support distro specific wheels on Linux, *BSD, etc Aug 4, 2018
@ncoghlan
Copy link
Member Author

ncoghlan commented Aug 4, 2018

(Note: retitled the issue, as I can never find this one when I want to reference it, and my usual search terms are "Linux distro specific wheels").

The primary driver of previous work in this area has been @natefoo, who was seeking to do something more robust for https://galaxyproject.org than relying on the fact that a Linux wheel was obtained from https://wheels.galaxyproject.org to indicate the target platform.

The result of that work can be seen at https://wheels.galaxyproject.org/simple/psycopg2/ where in addition to generic linux_<arch> tags and manylinux1_<arch> tags, you can also see wheels using the tag format linux_<arch>_<distro>_<distro_version>.

Unfortunately, as of 2018, the status quo is that the standard installation tools won't pick up these distro-specific wheels automatically - folks wanting to support the default versions of pip (et al) are still having to use "Which index server is it coming from?" to indicate that their "linux" wheels are actually targeting a specific subset of distros (e.g. supported versions of Raspbian for https://piwheels.org/).

The most relevant distutils-sig thread is still Nate's "Working toward Linux wheel support" thread from 2015: : https://mail.python.org/mm3/archives/list/[email protected]/thread/KCLRIN4PTUGZLLL7GOUM23S46ZZ2D4FU/

Two key links from that thread are:

  • Nate's work on correctly inferring the running distribution from distro provided metadata: https://gist.github.com/natefoo/814c5bf936922dad97ff
  • My proposal to use a system provided binary compatible configuration file to address the derived ABI compatible distro (e.g. Fedora -> RHEL -> CentOS, Debian -> Ubuntu) problem in a maintainable way that still allows end-users to override the settings on a per-virtualenv basis

Since 2015, we've adopted TOML as the preferred format for configuration files that we expects humans to be reading and editing, so the modern version of my proposal would like:


Binary compatible config file locations

  • system wide defaults: /etc/python/binary-compatibility.toml
  • per-virtualenv override: binary-compatibility.toml file in the base of the venv

File format

The format would consist of a [default] section, with optional per-cpu-arch overrides. Possible values within each section would be:

  • build-suffix: an extra value to append to the compatibility tags of any built wheel file
  • compatible-suffixes: additional tag suffixes to consider acceptable when downloading and installing wheel files. When installing, tags earlier in the sequence are preferred to those later in the sequence

(Possible future fields might include build-arch to start working towards per-venv cross-compilation support, and platform-override to better support manylinux wheel build processes)

For example, a comprehensive compatibility file on a recent CentOS 7 box might look like:

[default]
build-suffix="rhel_7_4"
compatible-suffixes=["rhel_7_3", "centos_7_3", "rhel_7_2", "centos_7_2", "rhel_7_1", "centos_7_1503", "rhel_7_0", "centos_7_1406"]

[aarch64]
compatible-suffixes=["rhsa_7_3", "centos_7_3", "rhsa_7_2", "centos_7_2", "rhsa_7_1", "centos_7_1503", "rhsa_7_0", "centos_7_1406"]

This is a deliberately complex example to show that just these two fields should suffice to cover even the more esoteric situations that can come up with Linux distros:

  • RHEL only gained native aarch64 support in RHEL 7.4. Before that, it was called Red Hat Server for ARM (in order to explicitly exclude it from RHEL's normal support lifecycle)
  • for the CentOS 7 series, Red Hat initially experimented with only referring to the CentOS releases by their month of release, not the RHEL point release that they were derived from. It looks like they've now gone back to using point release numbers that match the RHEL ones, but it's still a concrete example where it's useful to be able to cope with numbering scheme changes on the publishing side when declaring which input files you want to accept.

Note: Nate's original thread also contains a digression on the specification management process, which eventually became part of PEP 566 where the core metadata is concerned (https://www.python.org/dev/peps/pep-0566/#summary-of-differences-from-pep-345 ). For this change, the update would be to PEP 425, and would involve making https://packaging.python.org/specifications/platform-compatibility-tags/ the reference link for all compatibility tag processing guidelines.

(cc @pfmoore)

@njsmith
Copy link
Member

njsmith commented Aug 4, 2018

build-suffix: an extra value to append to the compatibility tags of any built wheel file

I wonder whether it would be better to attach this tag at bdist_wheel time, or to make it what auditwheel appends to wheels that it detects don't meet the requirements for a manylinux wheel. The advantage of the latter is that auditwheel is set up to handle detecting library dependencies and performing library vendoring, which is otherwise going to be a very very easy way to end up with wheels on pypi that don't actually work without a bunch of futzing around.

It might also be useful for the config file to record a list of libraries that can be assumed to be on any system with this tag, and that therefore don't need to be vendored.

@ncoghlan
Copy link
Member Author

ncoghlan commented Aug 4, 2018

default-build-suffix might be a better name for that field. The idea is that we should be aiming to get to a state where if you don't specify otherwise, you get a distro-specific wheel by default.

We won't want to use this file for stuff that may vary as the system state gets mutated, as that's a recipe for file corruption as other system packages get installed and uninstalled.

However, it could be used as a config file for a PEP 517 style "platform installer" interface, whereby a Python level installer and a system installer could collaborate to use system level packages where possible, and Python level packages otherwise.

That would push even more strongly in the direction of a more general purpose pyplatform.toml naming scheme, as I mentioned in pypa/pip#5605

@njsmith
Copy link
Member

njsmith commented Aug 4, 2018

we should be aiming to get to a state where if you don't specify otherwise, you get a distro-specific wheel by default.

This is what I am questioning. There are advantages to pushing people to go through auditwheel even for distro-specific builds.

@ncoghlan
Copy link
Member Author

ncoghlan commented Aug 4, 2018

We're not going to require people to use auditwheel just to create a local wheel from an sdist with pip wheel or pip install.

However, we can likely make it so that such wheels get flagged as distro-specific by default (even the ones implicitly generated by pip for storage in the local wheel cache), and then require folks to run them through auditwheel to get them to instead be marked as a distro-independent manylinux wheel, or even as a platform independent wheel that doesn't have any binary dependencies at all.

@pfmoore
Copy link
Member

pfmoore commented Aug 4, 2018

Can I take a step back here, as I'm very unfamiliar with the issues around wheels on Linix?

PEP 425 says that the platform tag is "simply distutils.util.get_platform()". So what we're discussing here is a change to PEP 425. There's already been such a change in the form of manylinux1 (PEP 513) but it doesn't seem to have been reflected in PEP 425 - and actually, on a quick skim, I didn't see a clear explanation in PEP 513 of what a valid manylinux tag is (there's a discussion of "good", "okay" and "bad" tagging in the UCS2 vs UCS 4 section, of all places, but nowhere is the set of allowable tags ever really spelled out). That's not a criticism of PEP 513 per se, as PEP 425 was pretty vague in the first instance, but I think we need to look at being clearer going forward.

Nick's discussion of a "build suffix" is confusing to me, because I don't see where it fits into PEP 425 - what I think he's saying is that the platform tag should be made up of distutils.util.get_platform() combined with an optional build suffix (separated by an underscore). But (a) that's just modifying the definition of a "platform tag", and (b) it leaves an ambiguity in that tools can't distinguish between the (original) platform tag and the build suffix, as both can contain the underscore separator. So in practical terms, tools can't usably treat the whole thing as anything other than a platform tag that doesn't correspond to PEP 425.

So, can I suggest that we need an update to PEP 425 that does the following:

  1. Clearly states the valid values for the platform tag. I'm assuming that it will be something along the lines of "must be a valid return value from distutils.util.get_platform() (or the same but with "linux" replaced by "manylinux1"), optionally followed by an underscore and an an arbitrary "build suffix" string.
  2. Clarifies how build tools should choose what platform tag to use, now that we've diverged from the simple "use distutils.util.get_platform()". I'm treating auditwheel as a build tool here, as it modifies the tags (as I understand it). That may involve defining a common configuration file from which build suffixes can be taken.
  3. Clarifies how install tools should choose which wheels to install - the algorithm is defined as "generate a prioritised list of acceptable tag combinations, and pick the highest priority one that exists", but we need a clearer explanation of how the tool should generate that list in the first place (the lack of such a spec has been the cause of a couple of "why won't pip install my wheel" issues in the past, I believe).

I'd also explicitly call out that Windows and MacOS environments are not allowed to use the build suffix mechanism, at least initially, because that's an extra level of complexity we don't want to get into right now (I could see mingw or msys2 builds of Python trying to use a build suffix to distinguish themselves from "standard" Windows Python, and establishing precedents that we're not ready to consider yet).

Apologies if I'm completely missing a load of important complexities here, but I really do think that we need to be aiming for PEP 425 (or a consolidated successor) to be clear enough that someone could, from scratch, write either a build tool or an installer, just referring to the PEP, and know that they are handling wheels as we intended.

@ncoghlan
Copy link
Member Author

ncoghlan commented Aug 4, 2018

Yep, the expected outcome of doing this would be an update to PEP 425 that provided a full tag specification at https://packaging.python.org/specifications/platform-compatibility-tags/ (similar to the core metadata spec, we've already partially migrated there, since that's the page that points out the manylinux PEPs effectively amend PEP 425 to include new platform tags and define what those tags mean)

I believe you've also correctly understood the gist of the problem that needs to be addressed: distutils.util.get_platform() is too generic to make a good platform tag outside the more strictly controlled worlds of Windows and Mac OS X, so it either needs something appended to it to get a platform tag that actually provides meaningful assurances about ABI compatibility, or else the default tag needs to be replaced by a different platform tag entirely (as the manylinux tags do).

Windows and Mac OS X can mostly get by with the PEP 425 platform tag definition, since they're primarily a single linear stream of releases, and the corresponding CPython binary installer sets the minimum supported operating system ABI version. However, even there support for platform suffixes would have potential use cases, since it would allows folks to target newer CPU features (like the AVX instruction sets for fast vectorised operations), while keeping those wheel files from being considered as installation candidates on systems that aren't explicitly flagged as being compatible.

The critical essence of my draft design sketch above is that Python level installers wouldn't need to know any of those low level technical details about why someone might want to define a custom platform tag. All they'd need to know is:

  1. Where to look for pyplatform.toml files
  2. Which attribute to load from those files in order to find extra platform suffixes to consider on installation
  3. Which attribute to load from those files in order to determine the default platform suffix to use when building a wheel file locally

The initial use case driving the concrete design would be for Linux distro specific wheels, but ideally we'd get an outcome that allowed folks to use reasonably arbitrary compatibility markers for private build pipelines (e.g. define a custom platform tag that lets folks append arbitrary suffixes).

@pfmoore
Copy link
Member

pfmoore commented Aug 4, 2018

OK, cool - thanks @ncoghlan. As I apparently have not misunderstood the situation too badly, I'll leave it to the experts to debate the details. Ping me if you need anything from me.

@njsmith
Copy link
Member

njsmith commented Aug 5, 2018

We're not going to require people to use auditwheel just to create a local wheel from an sdist with pip wheel or pip install.

Of course not. But we might want to require people to use auditwheel to create a wheel with a distro-specific tag that PyPI will accept for upload.

Another possibility to consider is whether we want to move towards bdist_wheel incorporating auditwheel-like functionality.

critical essence of my draft design sketch above is that Python level installers wouldn't need to know any of those low level technical details about why someone might want to define a custom platform tag

I'm getting worried about scope creep here... is the goal to expand the ABI tag to include options for "long tail" environments that are currently underserved (Alpine, conda, *BSD), or is it to provide a domain-specific language for inventing new kinds of ABI compatibility checking, in the form of a config file?

If we want to support wheels tagged by AVX vs non-AVX, IMO we should handle that as a feature in its own right (and e.g. you're going to want pip to autodetect CPU capabilities).

@ncoghlan
Copy link
Member Author

ncoghlan commented Aug 5, 2018

I don't want to allow custom tags or platform suffixes on PyPI, ever.

Instead, I want to enable folks running purpose-specific index servers like piwheels.org or wheels.galaxyproject.org, as well as orgs running private index servers, to set up their build & deployment pipelines such that everything "just works" for their environments, but if one of their pre-built wheel files escapes into the wild, other platforms will ignore it as being inapplicable to them.

Any cleverness related to platform feature detection would then live in the platform level configuration and installation tools that generate pyplatform.toml, not in the Python installation tools that read it.

So I don't see it as scope creep, I see it as deliberate scope limiting: Python tools read pyplatform.toml and act accordingly. How platform providers figure out what their pyplatform.toml should say isn't the installer's problem, just as it isn't the installer's problem to figure out how a PEP 517 backend should be implemented.

@dstufft
Copy link
Member

dstufft commented Aug 5, 2018

I don't want to allow custom tags or platform suffixes on PyPI, ever.

Custom tags 100% should not be on PyPI. Platform tags seem like they'd be a good thing to add though? Like I don't see anything wrong with being able to add a Ubuntu 16.04 wheel to PyPI.

@ncoghlan
Copy link
Member Author

ncoghlan commented Aug 5, 2018

Aye, if it's a platform where PyPA explicitly defines the form and content of permitted suffixes, then I agree it would be reasonable to allow it on PyPI. We have a potentially plausible way forward for that on Linux thanks to /etc/os-release and @natefoo's research in https://gist.github.com/natefoo/814c5bf936922dad97ff, and that may inspire folks to propose meaningful suffix schemes for other platforms.

One potential approach would be akin to IANA allocating private IP address ranges: reserve the custom platform name for arbitrary wheel tagging, but be clear that there's no expectation whatsoever of interoperability between custom tags defined for different environments.

@njsmith
Copy link
Member

njsmith commented Aug 5, 2018

Hmm, to me then it sounds like there are two separate issues here:

  • Coming up with a way to distribute wheels for "long-tail" distributions (i.e., the "why doesn't pip install work right on Alpine" problem)

  • Coming up with a way for folks building internal infrastructure around wheels to define and use custom tagging schemes

These seem like pretty different use cases, with different stakeholders, pain points, etc., so I think it'd be more productive to consider them separately.

@ncoghlan
Copy link
Member Author

ncoghlan commented Aug 5, 2018

I view the custom tagging as more like include the [tools] subtable in PEP 518: an escape valve to help discourage folks from trying to increase the scope of the core proposal too much.

My main concern is that I see significant scope for "But what about..." questions arising when considering the possibilities for platform subtagging, so having "Use a custom platform tag if you want that" as a readily available answer should make it a lot easier to put those kinds of questions aside.

@natefoo
Copy link
Member

natefoo commented Aug 27, 2018

Just a heads up, I broke the platform detection work out in to its own library at natefoo/lionshead. I also have outdated branches of pip and wheel that support "distro wheels" and binary-distribution.cfg (in Nick's proposed-at-the-time ini format). We also have a tool, Starforge which automates the build process, especially for such distro wheels.

With the success of manylinux, I haven't kept up my work on distro wheels, but if there's renewed interest I'd certainly revisit it.

@theacodes
Copy link
Member

This is probably super relevant to the gRPC and TensorFlow teams @ Google. @mehrdada @kpayson64

@ncoghlan
Copy link
Member Author

Some additional prior art from the NuGet world: https://github.com/dotnet/corefx/blob/master/pkg/Microsoft.NETCore.Platforms/runtime.json

Rather than relying on platforms to document their own derivation chains, the .NET folks came to the conclusion that it would be easier and more reliable to maintain their own database of derivation links. That means the Python ecosystem could potentially adopt the same approach, but use the .NET target environment database as its starting point (note though that the NuGet list appears to only cover Linux and Windows - no Mac OS X or iOS).

Additional explanation here: https://natemcmaster.com/blog/2016/05/19/nuget3-rid-graph/

I wasn't able to find any of their platform detection code.

(from @edenhill in pypa/manylinux#37 (comment) )

@earonesty
Copy link

earonesty commented Mar 4, 2019

Might be nice to echew the "detection" step in favor of how the packaging system "conan.io" does distro detection (https://github.com/conan-io/conan). In short: the builder of the package is responsible for detection of any platform specific things.

In python, a two phase process involves downloading a packageinfo.py file, if available at the repo, and running it locally.

The output is an arbitrary set of wheel tags needed. By doing this we push off the need to build-in platform detection and leave it up to the developer to determine what is relevant. For example, a developer can choose to add an optional tag "is_alpine" to a wheel. Whereas another developer can choose to specify the exact version of glibc as a tag. And yet another developer can detect which version of openssl is installed locally ... and create wheels for each version.

It doesn't seem like a huge stretch to make this a two phase feature and this vastly reduces the herculean task of figuring out every possible platform specific tag and how to detect it reliably in the past present and futures.

If anyone thinks that this is a good solution and a PEP is warranted, I'd be happy to start writing one.

@weskerfoot
Copy link

It would be great to have this feature, as I'm using devpi to host internal wheels, and the current workaround is to have separate parallel indices with _alpine added to the index name, which is a really hacky workaround.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants