Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] Implement building wheels from sdist/tarball artifacts #311

Open
webknjaz opened this issue Jun 16, 2021 · 28 comments
Open

[FR] Implement building wheels from sdist/tarball artifacts #311

webknjaz opened this issue Jun 16, 2021 · 28 comments
Labels
enhancement New feature or request

Comments

@webknjaz
Copy link
Member

This is an important use case for building platform-specific matrixes of wheels. In this scenario, one would build an sdist in a separate job, save it as an artifact, and then a matrix of subsequent jobs would create wheels using the same artifact. As a bonus, other binary artifacts from external ecosystems, like RPM, normally also get built from sdist so such jobs would also be runnable in parallel.

I suggest supporting the following invocation:

$ python -m build --wheel dist/some-tarball.tar.gz

I think this UX is good. And since the CLI already accepts a directory-type source in that place, pointing at a file would be natural.

Refs:

@henryiii
Copy link
Contributor

henryiii commented Jun 16, 2021

What would happen if a user did this:

$ python -m build dist/some-tarball.tar.gz

(no --wheel flag) Would this only produce a wheel, since it's already an sdist, would it copy the sdist to the outdir, or would it make an sdist from the sdist? Or would it throw an error (seems a bit harsh for the default?). How about the --sdist flag?

Other than working out details for what happens, I think this sounds great, and I think it will be easy after #304. I'd also like support for this in cibuildwheel, Xref pypa/cibuildwheel#597

@webknjaz
Copy link
Member Author

I think erroring out with "be verbose" is a good idea. Especially since I envision this to be primarily a CI use case where long verbose options are encouraged for maintainability.

@layday
Copy link
Member

layday commented Jun 16, 2021

Reusing the --wheel flag will likely add to the confusion. #304 says that the --wheel flag signifies that the wheel is built from source instead of the source distribution.

@FFY00
Copy link
Member

FFY00 commented Jun 16, 2021

--wheel will build from the specified source, that could be srcdir, or as this issue proposes, that could be a tarball. But I believe in doing that, we would need to improve the help message more than what #304 does.

We have tried to keep the CLI simple, so I'd like to hear what downsides the following approach would have.

python -m tarfile -e $project-$version.tar.gz
python -m build -w $project-$version

So far, I have not been able to come up with anything that would hold any reasonable weight against that.

Also, maybe I am mistaken, but don't RPM build tools extract the source by default.

@FFY00 FFY00 added the enhancement New feature or request label Jun 17, 2021
@webknjaz
Copy link
Member Author

--wheel will build from the specified source, that could be srcdir, or as this issue proposes, that could be a tarball. But I believe in doing that, we would need to improve the help message more than what #304 does.

Yep, that's what I was thinking: both dir (whether a Git checkout or just a plain old folder) and a tarball are technically source. So it'd be natural to accept either as the same CLI arg.

We have tried to keep the CLI simple, so I'd like to hear what downsides the following approach would have.

python -m tarfile -e $project-$version.tar.gz
python -m build -w $project-$version

So far, I have not been able to come up with anything that would hold any reasonable weight against that.

I see where you're coming from and now I'm feeling a little bit conflicted about this. OTOH since our goal is to streamline this as the best practice, it may be best to keep the number of steps low meaning I'd prefer to only run one command (build). It seems like the overhead of having unpacking+copying within build could be minimal.

Also, maybe I am mistaken, but don't RPM build tools extract the source by default.

AFAIK it's not rpm-build but the macros packagers use in the spec files that do the extraction. The spec only lists the sources but how they are processed is not specified. Those source dists often are even downloaded and put in place by external tools. The build picks them up and may unpack or do something else. For example, in the recent spec I've got, there's one main source that gets unpacked but there are also sources (build deps) of projects from PyPI that aren't available in the RPM repos so I pip install those in temporary build location.

@di
Copy link
Member

di commented Oct 5, 2021

I'm in favor of this! My $0.02 based on trying to do this and realizing the feature didn't exist: I expected python -m build dist/some-tarball.tar.gz to just work.

This would mean that srcdir would need to become something like:

src - source directory or source distribution (defaults to current directory)

that it would probably need to support both .tar.gz and .zip files, and that the existing flags would continue to behave as-is.

@layday
Copy link
Member

layday commented Oct 6, 2021

Should python -m build dist/some-tarball.tar.gz build both an sdist and a wheel or just the wheel? Should a .zip file be considered an sdist or just a zipped up source directory?

@di
Copy link
Member

di commented Oct 6, 2021

Should python -m build dist/some-tarball.tar.gz build both an sdist and a wheel or just the wheel?

I would expect it to work by unzipping/untarring the file into some temporary directory and then running as usual on that source directory. This would mean that python -m build dist/some-tarball.tar.gz would produce a new sdist (and a wheel).

This might have the effect of rebuilding the sdist (for example, upgrading it to a newer metadata version), and I think that behavior would probably be desirable.

Should a .zip file be considered an sdist or just a zipped up source directory?

Given my comment above, this probably wouldn't make a difference?

@layday
Copy link
Member

layday commented Oct 6, 2021

Yeah, I think that'd be preferable too, that it should function the same way regardless of input. Does anybody wanna take this up?

@FFY00
Copy link
Member

FFY00 commented Oct 6, 2021

I am reluctant of accepting this feature, but if we do I think we should only support .tar.gz sdists as they are required by PEP 517, and are the de facto standard, as per https://packaging.python.org/specifications/source-distribution-format/#source-distribution-file-name.

@layday
Copy link
Member

layday commented Oct 6, 2021

Would we perform any kind of validation then or do we assume that any gzipped tarball is an sdist?

@FFY00
Copy link
Member

FFY00 commented Oct 6, 2021

We would do some validation: check if the name matches the format, and check for PKG-INFO.

@rgommers
Copy link

rgommers commented Jan 7, 2023

I am reluctant of accepting this feature, but if we do I think we should only support .tar.gz sdists as they are required by PEP 517, and are the de facto standard,

The feature is needed, but I agree with the rest of this. Building one sdist from another sdist is not a valid thing to do. So python -m build dist/some-tarball.tar.gz should validate that the tarball is actually an sdist and then proceed with build's regular "build a wheel from an sdist" step.

@webknjaz
Copy link
Member Author

webknjaz commented Jan 7, 2023

Building one sdist from another sdist is not a valid thing to do.

That is not entirely true. Rebuilding sdists is useful for reproducibility verification. Also, it's useful when projects ship Cython-pregenerated C-files that need to be regenerated for compatibility in the future and the sdist should be enough. sdists are also widely used by downstream packagers so I wouldn't be as dismissive of them.

@rgommers
Copy link

rgommers commented Jan 7, 2023

Rebuilding sdists is useful for reproducibility verification

You rebuild from the original sources (git tag, or whatever) for that.

Also, it's useful when projects ship Cython-pregenerated C-files that need to be regenerated for compatibility in the future and the sdist should be enough.

That is a bad practice by now and should be changed by the project, but either way - you need to do this from the original sources.

sdists are simply a different thing conceptually than a vcs checkout. There is not a requirement for sdists to be idempotent, and in general they won't be because of practices like having a git commit hash in a version string (hence sdists can require a full git checkout rather than only a snapshot of the sources). There is no reason to introduce a new requirement like this on sdist's. It will be slow, introduces extra complexity in both build and for projects that do produce sdists, and has no compelling benefits.

The one "production path" is vcs -> sdist -> wheel. For performance reasons there is also vcs -> wheel directly, which should result in the same wheel. A vcs -> sdist -> sdist is not a thing.

@webknjaz
Copy link
Member Author

webknjaz commented Jan 7, 2023

@di by the way, you may want to try out my new action, as a workaround for this — https://twitter.com/webknjaz/status/1609252309749440512.

@webknjaz
Copy link
Member Author

webknjaz commented Jan 7, 2023

@rgommers I can't agree with that. Maybe in your bubble. But the reality is much more diverse.

@webknjaz
Copy link
Member Author

webknjaz commented Jan 7, 2023

and in general they won't be because of practices like having a git commit hash in a version string (hence sdists can require a full git checkout rather than only a snapshot of the sources)

It does actually work if implemented properly. With setuptools-scm, at least.

@rgommers
Copy link

rgommers commented Jan 7, 2023

But the reality is much more diverse.

Then perhaps build your own build frontend that does this, if you have a validation use case and it works in your domain and with setuptools-scm if that's what you care about? pypa/build is not the place - for it to be in this project, an sdist to sdist transformation should work universally, and it's pretty clear that it does not. It's clear both from first principles (e.g., what if the build_sdist hook is produced via a git-archive call, or what if your pre-generated Cython sources were there because the author needed a specific or patched Cython version for their release?) and because I arrived at this issue after testing exactly that (a user bug report with a downloaded-and-unpacked sdist didn't build).

It's of course not impossible that I'm horribly confused here, but if so then I'd really appreciate if you could point to a PEP or other authoritative document that explains that sdists->sdist must be supported.

@webknjaz
Copy link
Member Author

webknjaz commented Jan 7, 2023

@rgommers I don't think it's important to be in the PEP. People who don't need it would just not use it. But there's a compelling case for not artificially restricting this.

Many envs treat sdist as a complete enough copy of a project source, as it was intended originally. This is why it's expected to contain tests and docs, for example. It's called a source distribution because it can provide a project source. As such, it should be possible to make a new sdist out of that source, among other things.
https://packaging.python.org/en/latest/flow/#the-source-distribution-sdist.

Also, whether sdist-to-sdist works would mostly depend on the backend, build is there to just provide an interface.

@rgommers
Copy link

rgommers commented Jan 8, 2023

@rgommers I don't think it's important to be in the PEP. People who don't need it would just not use it.

That is not how things work. A widely used tool like build is used by all sorts of users. Not just distro packagers, but end users (example: build is the only choice right now for SciPy users wanting to build with a non-default BLAS library - that's a lot of users right there). If it is going to fail to produce a wheel for packages where Pip works just fine, that is a really bad experience.

Also, whether sdist-to-sdist works would mostly depend on the backend, build is there to just provide an interface.

Exactly on the first part - it depends. And build should work with all regular backends. So sdist-to-sdist would be a terrible default. Whether or not it can be added as a non-default path is still very much questionable - I'd suggest that it go elsewhere because it's niche and an extra maintenance burden, but that's not up to me.

As such, it should be possible to make a new sdist out of that source

Honestly, that's a wish, not a requirement. Python packaging is already complicated enough without introducing new things like that. sdist's are not very well-defined, but PEP 517 for example makes clear that they're different from "source trees" (https://peps.python.org/pep-0517/#terminology-and-goals), and that a wheel can be built from both. From PEP 517:

A build frontend is a tool that users might run that takes arbitrary source trees or source distributions and builds wheels from them.

@layday
Copy link
Member

layday commented Jan 8, 2023

I don't know to whom or why sdist idempotence is important or why this issue has pivoted towards it. When Filipe mentioned that "we should only support .tar.gz sdists", what they meant is that we should not support building wheels from historical sdist formats (zip files and what not) - not that we're considering supporting building sdists from sdists.

@webknjaz
Copy link
Member Author

webknjaz commented Jan 8, 2023

not that we're considering supporting building sdists from sdists.

I think that request here is to let build run on (standardized) sdists in addition to the already-unpacked directories and the rest of the logic (output results) would remain just as it is now.

@webknjaz
Copy link
Member Author

webknjaz commented Jan 8, 2023

@rgommers I don't think it's important to be in the PEP. People who don't need it would just not use it.

That is not how things work. A widely used tool like build is used by all sorts of users. Not just distro packagers, but end users (example: build is the only choice right now for SciPy users wanting to build with a non-default BLAS library - that's a lot of users right there). If it is going to fail to produce a wheel for packages where Pip works just fine, that is a really bad experience.

And yet, this is not even close to what we're talking about. It is not about being different from pip somehow. It's about not standing in the way of what's possible and legitimately useful in a number of use-cases. It does not conflict with the PEPs.

Also, whether sdist-to-sdist works would mostly depend on the backend, build is there to just provide an interface.

Exactly on the first part - it depends. And build should work with all regular backends. So sdist-to-sdist would be a terrible default. Whether or not it can be added as a non-default path is still very much questionable - I'd suggest that it go elsewhere because it's niche and an extra maintenance burden, but that's not up to me.

Nobody here requested this to be a default. I wouldn't want that, of course. Just not preventing the possibility actively is enough.

As such, it should be possible to make a new sdist out of that source

Honestly, that's a wish, not a requirement. Python packaging is already complicated enough without introducing new things like that. sdist's are not very well-defined, but PEP 517 for example makes clear that they're different from "source trees" (https://peps.python.org/pep-0517/#terminology-and-goals), and that a wheel can be built from both. From PEP 517:

A build frontend is a tool that users might run that takes arbitrary source trees or source distributions and builds wheels from them.

And yet, many expect source trees to be usable/modifiable, this is even documented in the PyPUG. Which makes unpacked sdists "arbitrary source trees" in this case. And build is already able to build sdists from them. This shouldn't change.

@FFY00
Copy link
Member

FFY00 commented Jan 10, 2023

From https://packaging.python.org/en/latest/flow/#the-source-distribution-sdist

A source distribution contains enough to install the package from source in an end user’s Python environment.

As I understand it, this is, and has always been, the only guarantee from sdists. There are some backends that do support building a sdist from a sdist, but there are tons that don't.

Adding direct support for this in build will normalize it, and put a burden on backend authors to support it, because users will expect that. As such, I think this is a question for the community to answer, we should open a thread on https://discuss.python.org/c/packaging/14 and ask everyone how do we want to handle these situations. If the consensus is that, yes, we want to support this use-case, I'd suggest https://packaging.python.org/en/latest/flow/#the-source-distribution-sdist to be updated to recommend backend authors to support sdists-from-sdists.


Thank you all for sharing your thoughts and highlighting details about the different workflows. That said, here's my position on this request.

Support python -m build --wheel dist/some-tarball.tar.gz

I believe this okay, however I note that there is already a one extra command, no extra dependencies, solution for this workflow (python -m tarfile -e dist/some-tarball.tar.gz, as pointed out in #311 (comment)). Because of this, I'd like to see at least one use case where having this functionality makes things significantly easier, or a workflow where a normal user would be expected to use such command, meaning that it would impact lots of people.

Regardless, if the point below is accepted, we can extend it with wheel support without further justification.

Support python -m build --sdist dist/some-tarball.tar.gz

As said above, to add this, there should be a consensus from the community. If that is the case, I'd set the same requirements as the point above (making a specific workflow significantly easier, or be included on a workflow a normal user would use).


@webknjaz does this seem reasonable to you? Do you think is there anything I missed or did not give enough weight?

I want to keep the CLI in this project as simple as we can, while still covering the common workflows, as it is intended to be supported indefinitely.

@rgommers
Copy link

or a workflow where a normal user would be expected to use such command, meaning that it would impact lots of people.

I arrived at this issue from a normal user trying to do this. As you know, pip's handling of --config-settings is less complete than build's, so when a user in a bug report got stuck on using --config-settings to install SciPy in a non-default way, I recommended build instead. Only to then figure out that this issue was the blocker. That regular user shouldn't be expected to manually unzip with python -m tarfile first, that'd be too cumbersome a UX.

Nobody here requested this to be a default. I wouldn't want that, of course. J

That's not quite true - right above my first response, @di suggested and @layday seemed to agree that this be the default. I'm glad you agree this isn't a good idea. For the optional part, I guess we'll have to agree to disagree.

@FFY00
Copy link
Member

FFY00 commented Jan 10, 2023

That is enough for me 👍

@webknjaz
Copy link
Member Author

From packaging.python.org/en/latest/flow/#the-source-distribution-sdist

A source distribution contains enough to install the package from source in an end user’s Python environment.

As I understand it, this is, and has always been, the only guarantee from sdists. There are some backends that do support building a sdist from a sdist, but there are tons that don't.

This claim seems weird. Build is a frontend, why are we even talking about backends here? Backends are invoked on a prepared project source directory and do not care where the front-end got it from. Building from sdist is not a feature of a backend because a backend doesn't need to work on that level, which is why it's not mentioned in PEP 517.

Adding direct support for this in build will normalize it, and put a burden on backend authors to support it, because users will expect that. As such, I think this is a question for the community to answer, we should open a thread on discuss.python.org/c/packaging/14 and ask everyone how do we want to handle these situations. If the consensus is that, yes, we want to support this use-case, I'd suggest packaging.python.org/en/latest/flow/#the-source-distribution-sdist to be updated to recommend backend authors to support sdists-from-sdists.

Note that PyPUG also documents a “source archive” at https://packaging.python.org/en/latest/glossary/#term-Source-Archive — a more generic thing that just contains a source checkout/snapshot and normally does not yet have any sdist-mandated metadata. Tools like pip on't have a problem working with those.
A source archive can be created by archiving a project directory, using the git archive command or downloading things like https://github.com/pypa/build/archive/refs/heads/main.tar.gz (effectively, created by git archive too).

With that in mind (that a source archive contains an unbuilt source), and the fact that build makes sdists and wheels from source, it stands to reason that it shouldn't matter whether that source is an unpacked directory or the same thing archived.

UX-wise, it's definetely cleaner to be able to point at a source that's not a directory but an archive. Furthermore, it might even make sense to accept a URL to such an artifact on CLI.

Thank you all for sharing your thoughts and highlighting details about the different workflows. That said, here's my position on this request.

Support python -m build --wheel dist/some-tarball.tar.gz

I believe this okay, however I note that there is already a one extra command, no extra dependencies, solution for this workflow (python -m tarfile -e dist/some-tarball.tar.gz, as pointed out in #311 (comment)). Because of this, I'd like to see at least one use case where having this functionality makes things significantly easier, or a workflow where a normal user would be expected to use such command, meaning that it would impact lots of people.

I don't see build targeting “normal users” — its users are project maintainers, packagers. Making a wheel out of an sdist is a typical thing their workflows have. And in case they ship C-extensions, it's spread across multiple jobs.

Regardless, if the point below is accepted, we can extend it with wheel support without further justification.

Support python -m build --sdist dist/some-tarball.tar.gz

As said above, to add this, there should be a consensus from the community. If that is the case, I'd set the same requirements as the point above (making a specific workflow significantly easier, or be included on a workflow a normal user would use).

This is something that is useful for linting the source correctness for the same user group as above.

As for “significantly easier”, when folks maintain multiple projects hand have to synchronize their automation across many places — the less, the better. Less separate steps to maintain == better reproducibility/stability on the scale.
The problem here is the same as with the API design — in Python, we discourage making the initialization process rely on extra steps not to be forgotten to be performed when making instances. Same here — CLI is a high-level interface, and its actions better be atomic.

@webknjaz does this seem reasonable to you? Do you think is there anything I missed or did not give enough weight?

I want to keep the CLI in this project as simple as we can, while still covering the common workflows, as it is intended to be supported indefinitely.

Honestly, I don't see a lot of complexity with just unpacking the input here. Instead, I see more complexity being suggested to validate the input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants