Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore build-backend and remove switch to avoid pep517. #1927

Merged
merged 3 commits into from
Dec 29, 2019

Conversation

jaraco
Copy link
Member

@jaraco jaraco commented Dec 1, 2019

Per Paul's comment in #1644, this change restores the build backend and removes the workaround to avoid pep517.

Rather than selecting 40.0, I chose 40.8 based on the error message reported in #1923.

@pganssle
Copy link
Member

pganssle commented Dec 3, 2019

It appears that pip has landed support for backend-path, but it's not released. I'm not sure if tox supports it yet, so I'm not sure if that changes things.

I'm a bit torn here. I think we should probably add backend-path = ["."] to our pyproject.toml file, which is what the spec says we should do to bootstrap our backend. However, that will only work with pip >= 20.0, whereas adding setuptools >= 40.8 will work with all current versions of pip unless --no-binary :all: is specified. If we do both, we'll get the boostrapped backend behavior for recent versions of pip and fall back to the wheel in older versions. This seems like it would be a best-of-both-worlds situation, except that since we're unconditionally requiring setuptools, --no-binary :all: will fail whether or not it is actually used (sorta - see [1]), so we'd be creating a slightly more varied build experience across versions of pip without solving the problem we were hoping to solve.

In the end, though, I think the most practical approach is still probably to add both the setuptools dependency and the backend-path key. I have still not heard a convincing argument for requiring that the entire stack be "build from source", including the pure-python setuptools wheel, which is just a zip file containing the source, so I'm inclined to let the people who are convinced of that do the extra work to make sure they never even download a wheel of setuptools.

As for why to bother with backend-path at all, the PEP specifies that cycles like this in the dependency graph are explicitly not allowed, so even though I think we need to add a cycle for the sake of practicality, eventually we will want to remove it once support for backend-path is sufficiently widely-adopted, so we might as well start using the bootstrapped backend function opportunistically so that we can suss out possible bugs.

[1] I'll note that just doing pip install --no-binary :all: . doesn't trigger this, you need to do something like PIP_NO_BINARY=":all:" pip install ., since apparently pip arguments don't propagate into the PEP 517 environment.

Copy link
Member

@pganssle pganssle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know my explanatory post was a bit long, but the TL;DR is that I think we should also add the backend-path key to pyproject.toml.

If we're adding both, I think we can probably actually start including pyproject.toml in our source distribution, but I think maybe we should wait until we're going to make a breaking release for other reasons anyway, just to give pip >= 20.0 some time to disseminate.

pyproject.toml Show resolved Hide resolved
@jaraco
Copy link
Member Author

jaraco commented Dec 29, 2019

I have still not heard a convincing argument for requiring that the entire stack be "build from source", including the pure-python setuptools wheel, which is just a zip file containing the source, so I'm inclined to let the people who are convinced of that do the extra work to make sure they never even download a wheel of setuptools.

I've spent hours going back and forth on this issue with platform maintainers (@abadger, @warsaw) and other package systems like Spack (@tgamblin). The issue is that they wish to build artifacts of various packages all from source (distributions or repo), and so they demand a DAG of dependencies (keyword being Acyclic). And because all projects are built by setuptools, setuptools cannot itself have any build dependencies, including setuptools. This need stems from the principal that it should be possible to build an open-source stack entirely from source, which seems a reasonable request. Furthermore, it's a hard requirement that this platform be capable of building the entire stack offline (artifacts must be known and downloaded in advance), to maintain trust and comply with security constraints in highly regulated environments.

My instinct was the same as Paul's, that it should be possible to build even the most foundational tools (setuptools) from pre-built artifacts (wheels, specially-built bootstrapping artifacts, etc).

@jaraco
Copy link
Member Author

jaraco commented Dec 29, 2019

My instinct was the same as Paul's, that it should be possible to build even the most foundational tools (setuptools) from pre-built artifacts (wheels, specially-built bootstrapping artifacts, etc).

But the only way I see to do that in a system built from pre-downloaded sources would be for that system to include setuptools' build requirements as wheels or for setuptools itself to somehow supply its own build requirements. Presumably setuptools can do this once build-path is honored, by including its build dependencies in the sdist.

@jaraco
Copy link
Member Author

jaraco commented Dec 29, 2019

adding setuptools >= 40.8 will work with all current versions of pip unless --no-binary :all: is specified

Does this mean this change will break source-only builds for downstream packagers, or will they be unaffected because they're not using pip to build? I suspect this issue is also mitigated/masked by the fact that pyproject.toml isn't included in the sdist (#1644).

@jaraco
Copy link
Member Author

jaraco commented Dec 29, 2019

Rather than let this issue linger, I want to proceed with this potentially risky change. It may impact downstream packagers, and if it does, I want to be sure this time to capture some strict requirements in the setuptools test suite such that it will guarantee the requirements are met.

@jaraco jaraco merged commit e6bdf25 into master Dec 29, 2019
@jaraco jaraco deleted the bugfix/1644-build-backend branch December 29, 2019 17:23
@jaraco jaraco mentioned this pull request Dec 29, 2019
2 tasks
@adamjstewart
Copy link

Hi @jaraco, I'm a Spack developer along with @tgamblin. Thanks for looping us into the conversation!

I'm still trying to wrap my head around the implications of this PR, so forgive me if I'm jumping to conclusions. At first glance, it looks like this PR is a step in the wrong direction.

I have still not heard a convincing argument for requiring that the entire stack be "build from source"

@pganssle See #980 for prior discussion of this issue. The problem is that there are many package managers out there aside from pip, and Spack is one of them. Spack is a Supercomputing PACKage manager, running on some of the largest supercomputers in the world. These clusters often have esoteric architectures and programming environments, like ARM, PowerPC, BlueGene Q, and Cray. Each of these systems come with various compilers, MPI libraries, and BLAS/LAPACK libraries. The goal of package managers like Spack is to allow users to build multiple different configurations of the same package with different compilers and different CPU microarchitectures. Wheels are a cute idea for the average user on their laptop, but for us, wheels simply don't exist for these architectures. This is one of many reasons why we build from source.

It seems like this PR once again makes setuptools challenging to build from source. For those of us who still need to build from source, I see the following solutions:

  1. First install setuptools <= 42 in order to bootstrap setuptools 43+
  2. Use pip to install setuptools only
  3. Stop updating our setuptools package and only support setuptools <= 42 until a better solution arises

3 is the easiest solution for us, but obviously no one wants that. We don't want to be stuck in history any more than you want people using ancient versions of setuptools. 2 could work for us, although having a package manager (Spack) require a different package manager (pip) to be installed in order to install Python wheels seems like a step in the wrong direction. 1 could also work, and we do similar things for languages like Go, but once again it seems to introduce more complexity than should be necessary.

Are there any other solutions I'm missing? Or something about this change that I'm not understanding?

@pganssle
Copy link
Member

pganssle commented Dec 29, 2019

Wheels are a cute idea for the average user on their laptop, but for us, wheels simply don't exist for these architectures. This is one of many reasons why we build from source.

I think you are missing the point here. I'm not talking about requiring arbitrary wheels, I'm saying that for universal python wheels that contain no extension code or architecture-specific information, "building from source" is a largely bureaucratic checkbox, because the wheel file is a zip file containing the source code. Allowing setuptools to bootstrap from a setuptools wheel will present none of the problems you describe.

It is not clear to me why you would require pip to install setuptools. You just need to be able to install universal wheels, which amounts to unzipping them in the right place. You should probably support that anyway, because there are lots of packages that don't have unusual edge cases that have pre-built wheels, and installing from a wheel means you don't have to execute arbitrary code in the build step - in addition to being faster, it's safer.

I think the decision to add a build-backend path was wrong in the first place, and it was motivated by the desire to generalize something that didn't need to be generalized, but it is what it is. The way forward is to support backend-path. I haven't read the spec recently, but I think it would be spec-compliant for you to detect cycles in the PEP 518 requirements graph and either throw an error or try and correct the issue. In this case you could detect that setuptools depends on itself but it specifies a build-backend path, and ignore the setuptools requirement.

@tgamblin
Copy link

tgamblin commented Dec 30, 2019

I looked through this, and it's not going to cause an issue for Spack, mainly because it's a circular dependency from setuptools to itself. To get around that, we can use setuptools within the build environment. setuptools build dependencies are still vendored, so we don't have to worry about setuptools not working in-tree, as we did in #980.

So, our Spack packages can install straight from git like this:

$ python ./boostrap.py
$ PYTHONPATH="." python -s ./setup.py --no-user-cfg install --single-version-externally-managed --root=/

I verified that this works fine in a Spack environment with python but not setuptools or pip. The main differences are:

  1. we now have to run bootstrap.py
  2. we have to add PYTHONPATH="."

Neither of those is a big deal. We don't currently use PEP518 (because it's still provisional, right?) -- we have our own package format, so we don't really have to ignore anything. We can just omit the setuptools dependency from its own package, which is what I think @pganssle is suggesting.

I gotta say, though, that I had to dig more than I really expected to figure this out. I am probably not as up to date on the latest in Python packaging as I should be, but here are a few suggestions:

Docs

Given that setuptools is probably the lowest level component of the python ecosystem, I think it would be extremely helpful for distro folks if this section of the setuptools docs had some lower-level instructions, not just "use pip -U install setuptools".

Yes, you can install setuptools with pip, but what's the recommended way for other package managers to do it? pip itself requires setuptools, so you have to dig around to figure out that you should use ensurepip to get it, and that ensurepip just unzips pip and setuptools wheels.

I think the docs should mention the two methods discussed above:

  1. Run python bootstrap.py && PYTHONPATH="." setup.py ...
  2. unzip the setuptools wheel (which requires the wheel)

There should probably be a caveat that these methods are really for people writing build systems or package managers, but it would sure make things more obvious.

build-backend and dependencies

RE @pganssle:

I haven't read the spec recently, but I think it would be spec-compliant for you to detect cycles in the PEP 518 requirements graph and either throw an error or try and correct the issue. In this case you could detect that setuptools depends on itself but it specifies a build-backend path, and ignore the setuptools requirement.

This was helpful. I went and read the PEP517 spec, though, and it is not clear to me how backend-path is supposed to relate to build dependencies. It's not clear to me why setuptools needs to define a dependency on itself if it's in its own backend-path. Why does pip need the backend-path and the dependency? IMO, if you can just do PYTHONPATH="." when running setup.py, the project doesn't really have a dependency.

Also, the term build-backend isn't actually clearly defined in PEP517. The terminology section just says that a "build backend" is whatever does the work for a "build frontend", which it says is some kind of tool. I had to read through the spec to figure out that it was a python module with some well defined entry points.

Maybe this is stuff for @njsmith or @takluyver to clarify.

Wheels

RE @pganssle:

I'm not talking about requiring arbitrary wheels, I'm saying that for universal python wheels that contain no extension code or architecture-specific information, "building from source" is a largely bureaucratic checkbox, because the wheel file is a zip file containing the source code.

I agree that we could use universal wheels, but this goes against a very old notion in package management of relying on pristine sources. We really want to build from the source distribution in Spack, and PEP517 seems to support that notion:

A source tree is something like a VCS checkout. We need a standard interface for installing from this format, to support usages like pip install some-directory/.

A source distribution is a static snapshot representing a particular release of some source code, like lxml-3.4.4.tar.gz. Source distributions serve many purposes: they form an archival record of releases, they provide a stupid-simple de facto standard for tools that want to ingest and process large corpora of code, possibly written in many languages (e.g. code search), they act as the input to downstream packaging systems like Debian/Fedora/Conda/..., and so forth. In the Python ecosystem they additionally have a particularly important role to play, because packaging tools like pip are able to use source distributions to fulfill binary dependencies, e.g. if there is a distribution foo.whl which declares a dependency on bar, then we need to support the case where pip install bar or pip install foo automatically locates the sdist for bar, downloads it, builds it, and installs the resulting package.

I would be surprised if the Debian folks would accept a build that build a package straight from a wheel and not from the main repo (only mention I could find was here). I suspect most distros would be ok with using wheels if one could use tooling to generate the wheels from the source distribution. This seems to be something that PEP517 wants to support (as it's one of the build-backend endpoints), so that's cool.

But practically speaking, that puts me in the position of relying on the wheel tool for the build, which requires setuptools, and that leads to yet another bootstrapping problem.

Summary

RE @jaraco:

It may impact downstream packagers, and if it does, I want to be sure this time to capture some strict requirements in the setuptools test suite such that it will guarantee the requirements are met.

As I said above, I don't think this change will impact downstream packagers, but that's only because setuptools still vendors its dependencies, and because it does not require a wheel to bootstrap.

As long as setuptools is its own, self-hosted build backend, things are good, but it should not un-vendor the other build dependencies (which themselves require setuptools), or we're back to an un-resolvable cycle and the ecosystem can't be bootstrapped from scratch. setuptools is unique in this regard, as it's the deepest dependency in the python tree. If other stuff wants to always build through a wheel, I think that is fine, but setuptools at least needs to be bootstrappable from pristine source if it doesn't want to make things hard for distros.

Documenting this would help. I also think PEP517 should be clearer about whether self-hosted dependencies really need to be specified when there is a backend-path, but that's a separate issue.

@pganssle
Copy link
Member

This was helpful. I went and read the PEP517 spec, though, and it is not clear to me how backend-path is supposed to relate to build dependencies. It's not clear to me why setuptools needs to define a dependency on itself if it's in its own backend-path. Why does pip need the backend-path and the dependency? IMO, if you can just do PYTHONPATH="." when running setup.py, the project doesn't really have a dependency.

backend-path is a very obscure thing that is only useful for self-bootstrapping backends. There is basically no reason to write more than one self-bootstrapping backend, though probably at least flit and setuptools will both become self-bootstrapping.

By defining a dependency on setuptools, we're deliberately breaking the spec because no front-ends actually supported build-backend when I made my comment, and there is no other reasonable way to support pip < 20.0 and pip >= 20.0. The only thing it will break is people who want to use pip install --no-binary :all:, which to me is acceptable in the medium-term. The kind of person who does that generally also knows how to patch their setuptools as appropriate.

I would be surprised if the Debian folks would accept a build that build a package straight from a wheel and not from the main repo (only mention I could find was here). I suspect most distros would be ok with using wheels if one could use tooling to generate the wheels from the source distribution. This seems to be something that PEP517 wants to support (as it's one of the build-backend endpoints), so that's cool.

This is not a good reason not to use wheels to bootstrap things, it's just Debian's convention. I don't really care to argue the point, I already lost this battle a while ago, and it is already unnecessary for people to build from wheels. We will eventually come back into compliance with the spec when support for the full spec is widespread enough that it won't break too many things to do so.

I think the docs should mention the two methods discussed above:

  1. Run python bootstrap.py && PYTHONPATH="." setup.py ...
  2. unzip the setuptools wheel (which requires the wheel)

Our documentation is not well-organized and out of date in many ways. There are many things in the wish list.

I would not recommend documenting either of these, since I don't consider either of these to be the right way to produce a setuptools wheel / installation.

Once we are able to ship a pyproject.toml file without it breaking everything (which is what this PR does), the right thing to do is to implement or use a PEP 517 front-end and then invoke a PEP 517 build to build a wheel, then install that wheel. The only part that is setuptools-specific is that our pyproject.toml is not currently spec-compliant, which means you have to know that if you've implemented backend-path, you can ignore the setuptools dependency. The documentation for the other stuff either does live or should live in packaging.python.org.

Neither of those is a big deal. We don't currently use PEP518 (because it's still provisional, right?) -- we have our own package format, so we don't really have to ignore anything.

This is probably a mistake. PEP 517 and 518 are provisional in the same sense that pandas is still in the 0.x series. By and large they won't change at this point unless some serious show-stopper occurs. I would recommend moving towards the PEP 517/518 world early, because it was explicitly designed for people like you - because for ages Python build behavior was "whatever distutils does" and "whatever pip does", requiring all alternate build systems to be bug-compatible with setuptools and pip. PEP 517 and 518 are a move towards standardizing a model for builds that doesn't assume that setuptools is doing the builds (allowing individual projects to specify their back-ends) or that pip is doing the installation (allowing individual users to specify the front-ends).

@tgamblin
Copy link

There is basically no reason to write more than one self-bootstrapping backend, though probably at least flit and setuptools will both become self-bootstrapping.

This is good.

I already lost this battle a while ago, and it is already unnecessary for people to build from wheels.

👍🏻

We will eventually come back into compliance with the spec when support for the full spec is widespread enough that it won't break too many things to do so.

Ok.

I would not recommend documenting either of these, since I don't consider either of these to be the right way to produce a setuptools wheel / installation.

Once we are able to [...]. The documentation for the other stuff either does live or should live in packaging.python.org.

I don't think that makes a whole lot of sense, since the two ways I mentioned are the only ways that actually work without pip, and setuptools devs are the ones who actually control and guarantee that setuptools can be installed a certain way. How is that not "setuptools-specific"?. Yes, PEP517 specifies how things "should" be installed one day when everything is not broken, but it takes two lines to describe how things can work well right now. I'd be willing to submit a PR, but y'all are in charge.

This is probably a mistake. PEP 517 and 518 are provisional in the same sense that pandas is still in the 0.x series.

I didn't mean to say that we would not be moving towards them. Yes, they look promising. At the very least PEP517 can help us, and it would be great to mine dependency specs when things support PEP518.

I do not think it is a "mistake" for us to have our own package format and our own dependency specification, beyond what PEP518 prescribes. Spack supports many more types of packages than pip is ever going to -- we build with different compilers/build options/optimized architectures/etc. PEP517 and PEP518 solve some python packaging problems, but there is still a ton of work to be done on the compiled side (which is our focus).

@pganssle
Copy link
Member

I don't think that makes a whole lot of sense, since the two ways I mentioned are the only ways that actually work without pip, and setuptools devs are the ones who actually control and guarantee that setuptools can be installed a certain way.

This is not true at all. Any PEP 517-compatible front-end can install setuptools using only PEP 517. There's even a library that does it for you, pep517. I am saying that the correct thing to do is to either build a PEP 517 front-end or use a PEP 517 front-end (of which pip is one).

How is that not "setuptools-specific"?. Yes, PEP517 specifies how things "should" be installed one day when everything is not broken, but it takes two lines to describe how things can work well right now. I'd be willing to submit a PR, but y'all are in charge.

PEP 517 specifies how PEP 517-compliant things can be installed. setuptools is a valid PEP 517-backend capable of installing itself. We don't need to wait for anything to be "not broken". If we're documenting anything, we should document that the supported way forward is to install setuptools via PEP 517.

I do not think it is a "mistake" for us to have our own package format and our own dependency specification, beyond what PEP518 prescribes. Spack supports many more types of packages than pip is ever going to -- we build with different compilers/build options/optimized architectures/etc. PEP517 and PEP518 solve some python packaging problems, but there is still a ton of work to be done on the compiled side (which is our focus).

I'm saying it's a mistake to wait for PEP 518 to no longer be provisional before implementing it. It's essentially done now, and you probably want to understand how it fits into your workflow before it stops being provisional, so you still have an opportunity to get it changed (though as time goes on it will become harder to change PEP 518 for backwards-compat reasons).

My overall point is this: for many packages, you already need a PEP 517/518 front-end to install the package correctly. Sometimes it is possible to make ad-hoc modifications to the build system to avoid this, but you'll have a much easier time if you actually make use of the pyproject.toml file, which provides you all the information you need to build a python package.

@tgamblin
Copy link

If we're documenting anything, we should document that the supported way forward is to install setuptools via PEP 517.

This would be extremely helpful. I did not know this because it is not documented. I tried looking at packaging.python.org, but the only thing there that seems relevant is:

The accepted style of source distribution format based on pyproject.toml, defined in PEP 518 and adopted by PEP 517 has not been implemented yet.

I will look into reworking our python support to use pep517. I actually do not think this will be too hard -- we will need to add some logic to our PythonPackage class, but I do not think there is much more to it than that.

you'll have a much easier time if you actually make use of the pyproject.toml file, which provides you all the information you need to build a python package.

At the moment, we do get most of the version information that goes into a package.py from pyproject.toml (assuming the project has one). @adamjstewart can say more about that, but it would be nice if everything started moving that way and generating constraints for most of our pure python packages. It would be extremely cool if we grew a front-end to PyPI that understood most packages (though this is likely a long way off). We would likely still have native spack package.py descriptions of things like numpy, pandas, tensorflow, etc. -- anything with a compiled component.

@adamjstewart
Copy link

At the moment, we do get most of the version information that goes into a package.py from pyproject.toml (assuming the project has one). @adamjstewart can say more about that, but it would be nice if everything started moving that way and generating constraints for most of our pure python packages.

I love Python as a language, but its lack of a consistent build system leaves a lot to be desired. I realize that this is one of the reasons setuptools was created. Once upon a time, if I needed to know what modules a Python library depends on, I could check in requirements.txt. Then setuptools came along, and allowed you to programmatically specify dependencies via install_requires in setup.py. Then more recently, I started noticing people using setup.cfg to specify dependencies. I just learned about pyproject.toml recently, and I think it's great as it allows us to see dependencies without actually running the setup.py script, but now we're at the point where I need to manually check 4 different files in order to determine what modules a library depends on.

For comparison, the R language has an incredibly uniform and robust build system, where every package is required to use a DESCRIPTION file listing their dependencies. CRAN (the equivalent of PyPI) automatically tests installation of these packages, and if a dependency is missing, the build fails and the package is not allowed on CRAN. This forces R developers to conform to a single standard if they want to distribute their software.

I think PEP 517/518 look great, but until someone drops backwards compatibility and forces packages to use pyproject.toml, we'll just end up with a 4th competing standard.

@tgamblin
Copy link

@adamjstewart: I do think people are moving in the pyproject.toml direction -- thanks to the efforts of folks like @pganssle and others on this thread. It'll be awesome once that happens!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants