-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FR] Implement building wheels from sdist/tarball artifacts #311
Comments
What would happen if a user did this: $ python -m build dist/some-tarball.tar.gz (no Other than working out details for what happens, I think this sounds great, and I think it will be easy after #304. I'd also like support for this in cibuildwheel, Xref pypa/cibuildwheel#597 |
I think erroring out with "be verbose" is a good idea. Especially since I envision this to be primarily a CI use case where long verbose options are encouraged for maintainability. |
Reusing the |
We have tried to keep the CLI simple, so I'd like to hear what downsides the following approach would have. python -m tarfile -e $project-$version.tar.gz
python -m build -w $project-$version So far, I have not been able to come up with anything that would hold any reasonable weight against that. Also, maybe I am mistaken, but don't RPM build tools extract the source by default. |
Yep, that's what I was thinking: both dir (whether a Git checkout or just a plain old folder) and a tarball are technically source. So it'd be natural to accept either as the same CLI arg.
I see where you're coming from and now I'm feeling a little bit conflicted about this. OTOH since our goal is to streamline this as the best practice, it may be best to keep the number of steps low meaning I'd prefer to only run one command (build). It seems like the overhead of having unpacking+copying within build could be minimal.
AFAIK it's not |
I'm in favor of this! My $0.02 based on trying to do this and realizing the feature didn't exist: I expected This would mean that
that it would probably need to support both |
Should |
I would expect it to work by unzipping/untarring the file into some temporary directory and then running as usual on that source directory. This would mean that This might have the effect of rebuilding the
Given my comment above, this probably wouldn't make a difference? |
Yeah, I think that'd be preferable too, that it should function the same way regardless of input. Does anybody wanna take this up? |
I am reluctant of accepting this feature, but if we do I think we should only support |
Would we perform any kind of validation then or do we assume that any gzipped tarball is an sdist? |
We would do some validation: check if the name matches the format, and check for |
The feature is needed, but I agree with the rest of this. Building one sdist from another sdist is not a valid thing to do. So |
That is not entirely true. Rebuilding sdists is useful for reproducibility verification. Also, it's useful when projects ship Cython-pregenerated C-files that need to be regenerated for compatibility in the future and the sdist should be enough. sdists are also widely used by downstream packagers so I wouldn't be as dismissive of them. |
You rebuild from the original sources (git tag, or whatever) for that.
That is a bad practice by now and should be changed by the project, but either way - you need to do this from the original sources. sdists are simply a different thing conceptually than a vcs checkout. There is not a requirement for sdists to be idempotent, and in general they won't be because of practices like having a git commit hash in a version string (hence sdists can require a full git checkout rather than only a snapshot of the sources). There is no reason to introduce a new requirement like this on sdist's. It will be slow, introduces extra complexity in both The one "production path" is |
@di by the way, you may want to try out my new action, as a workaround for this — https://twitter.com/webknjaz/status/1609252309749440512. |
@rgommers I can't agree with that. Maybe in your bubble. But the reality is much more diverse. |
It does actually work if implemented properly. With setuptools-scm, at least. |
Then perhaps build your own build frontend that does this, if you have a validation use case and it works in your domain and with setuptools-scm if that's what you care about? It's of course not impossible that I'm horribly confused here, but if so then I'd really appreciate if you could point to a PEP or other authoritative document that explains that sdists->sdist must be supported. |
@rgommers I don't think it's important to be in the PEP. People who don't need it would just not use it. But there's a compelling case for not artificially restricting this. Many envs treat sdist as a complete enough copy of a project source, as it was intended originally. This is why it's expected to contain tests and docs, for example. It's called a source distribution because it can provide a project source. As such, it should be possible to make a new sdist out of that source, among other things. Also, whether sdist-to-sdist works would mostly depend on the backend, build is there to just provide an interface. |
That is not how things work. A widely used tool like
Exactly on the first part - it depends. And
Honestly, that's a wish, not a requirement. Python packaging is already complicated enough without introducing new things like that. sdist's are not very well-defined, but PEP 517 for example makes clear that they're different from "source trees" (https://peps.python.org/pep-0517/#terminology-and-goals), and that a wheel can be built from both. From PEP 517:
|
I don't know to whom or why sdist idempotence is important or why this issue has pivoted towards it. When Filipe mentioned that "we should only support |
I think that request here is to let build run on (standardized) sdists in addition to the already-unpacked directories and the rest of the logic (output results) would remain just as it is now. |
And yet, this is not even close to what we're talking about. It is not about being different from pip somehow. It's about not standing in the way of what's possible and legitimately useful in a number of use-cases. It does not conflict with the PEPs.
Nobody here requested this to be a default. I wouldn't want that, of course. Just not preventing the possibility actively is enough.
And yet, many expect source trees to be usable/modifiable, this is even documented in the PyPUG. Which makes unpacked sdists "arbitrary source trees" in this case. And build is already able to build sdists from them. This shouldn't change. |
From https://packaging.python.org/en/latest/flow/#the-source-distribution-sdist
As I understand it, this is, and has always been, the only guarantee from sdists. There are some backends that do support building a sdist from a sdist, but there are tons that don't. Adding direct support for this in Thank you all for sharing your thoughts and highlighting details about the different workflows. That said, here's my position on this request. Support
|
I arrived at this issue from a normal user trying to do this. As you know,
That's not quite true - right above my first response, @di suggested and @layday seemed to agree that this be the default. I'm glad you agree this isn't a good idea. For the optional part, I guess we'll have to agree to disagree. |
That is enough for me 👍 |
This claim seems weird. Build is a frontend, why are we even talking about backends here? Backends are invoked on a prepared project source directory and do not care where the front-end got it from. Building from sdist is not a feature of a backend because a backend doesn't need to work on that level, which is why it's not mentioned in PEP 517.
Note that PyPUG also documents a “source archive” at https://packaging.python.org/en/latest/glossary/#term-Source-Archive — a more generic thing that just contains a source checkout/snapshot and normally does not yet have any sdist-mandated metadata. Tools like pip on't have a problem working with those. With that in mind (that a source archive contains an unbuilt source), and the fact that UX-wise, it's definetely cleaner to be able to point at a source that's not a directory but an archive. Furthermore, it might even make sense to accept a URL to such an artifact on CLI.
I don't see build targeting “normal users” — its users are project maintainers, packagers. Making a wheel out of an sdist is a typical thing their workflows have. And in case they ship C-extensions, it's spread across multiple jobs.
This is something that is useful for linting the source correctness for the same user group as above. As for “significantly easier”, when folks maintain multiple projects hand have to synchronize their automation across many places — the less, the better. Less separate steps to maintain == better reproducibility/stability on the scale.
Honestly, I don't see a lot of complexity with just unpacking the input here. Instead, I see more complexity being suggested to validate the input. |
This is an important use case for building platform-specific matrixes of wheels. In this scenario, one would build an sdist in a separate job, save it as an artifact, and then a matrix of subsequent jobs would create wheels using the same artifact. As a bonus, other binary artifacts from external ecosystems, like RPM, normally also get built from sdist so such jobs would also be runnable in parallel.
I suggest supporting the following invocation:
$ python -m build --wheel dist/some-tarball.tar.gz
I think this UX is good. And since the CLI already accepts a directory-type source in that place, pointing at a file would be natural.
Refs:
The text was updated successfully, but these errors were encountered: