-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
People are distributing invalid wheels because pip is sloppy about checking them; pip should be pickier #3513
Comments
2nd point is really important, because for many people including me the necessity of METADATA and RECORD stuff is not obvious. Wheels work without this complication, so why do you need to add it? |
+1 on pip rejecting wheels which don't contain the minimal metadata. @techtonik For "real reasons", the spec for a wheel at https://www.python.org/dev/peps/pep-0427/#the-dist-info-directory states clearly that METADATA must contain data in the same format as PKG-INFO at the root of sdists (which in turn is specified by PEP 345), and that RECORD must contain a list of (almost) all files in the archive. So wheels that don't do that are invalid, and should be rejected. Up until now, pip has trusted that wheels will be valid, simply because in practice only bdist_wheel was used to produce them. But if wheels are appearing in the wild that don't conform to spec, we should stop that sooner rather than later. |
And just to clarify, the reasons for putting valid data in those files:
|
@pfmoore without solid arguments it will be another bad solution that people like me will hate.
Too many redirections create the same complexity problem that the wheel format was promised to avoid. must doesn't explain why wheel can't come with its own metadata that makes sense for wheels and don't have the burden of backward compatibility of PKG-INFO, leaving this burden to a wheel-distutils mapper inside of pip, which needs this mapper anyway. Maybe obsession with PKG-INFO is the reason why conda and alternatives exist.
From the security point of view there is no "incomplete checksum" and "no checksum" is the same thing. .zip files are already protected from corruption by default, so what are you protecting from? Do you have real tests for that? Adding additional step for maintaining the internal checksum that will never be used because corrupted .zip already fails to unpack? I don't get it. If my build system produced corrupted binary, my packaging toolchain will make a proper checksum for that. Doing proper RECORD complicates wheel packaging and I don't see for which sake. METADATA should not be necessarily in PKG-INFO format and wheels should not distribute the pieces of target catalog. It is the target system which should decide which fileds should be present to accept the wheel. PyPI needs author fields, but local install doesn't. Some wheels have dozens of authors, but METADATA has only one. The author is therefore useless for my scenario, especially when I don't use email fields, but GitHub (or Debian) IDs, but PKG-INFO tells that author is single and it is required. There are more arguments here pypa/wheel-builders#1 (comment) So, to summarize - I think that the fact that wheels chosen not to implement old cliches is the right thing. It should make at least RECORD optional until there is a system that shows a value added behavior for this file that really benefits everyone in practice. I won't argue that METADATA is useful, but I think that omitting it is a feature. It could be beneficial for local development and for simplicity. PyPI or conda package sites may choose to ignore wheels with insufficient metadata, but chances are that this info is already there in some more human readable format and it is easier for everybody to set the package information through PyPI API directly than to use this strange format of ZIP-RPC. |
there are solid arguments wheels are supposed to be install-able by running unzip and then a compile-all module so making them optional breaks a promise, meta-data consistency should be enforced |
@techtonik Feel free to hate it, but the wheel spec says what it does, and calling something that doesn't follow that spec a wheel is wrong. If you don't like the spec, you can propose an updated version by going through the normal PEP process. So I remain a strong +1 on rejecting malformed wheels. If no-one else comes up with a PR for this, I'll produce one at some point. |
What means
I can only see the bytecode compilation use case here, so the arguments must be related to it. |
@pfmoore it is your choice. You know that I hate that PEP bureacracy and won't follow it. The fact that the points raised here can not be answered by giving direct links to PEP just enforces my feelings. But your action will make job of tools developers more complicated that necessary. Why will it happen? As I see it, to get rid of the nuisance. This could be an interesting pattern for manipulation in case I really needed to make things worse. It directly depends on how much free time people have to discuss such issues, and the only way to get protected from it is to iteratively improve the information passing - direct links to paragraphs, improved wording, real world use cases/practices and diagrams. I don't know how deep @njsmith is aware why various WHEEL requirements are existing, but my bet that he is not aware, so the only way for him and everybody to support this discussion is to believe that some smart guys are already thought about it and choose the side. https://en.wikipedia.org/wiki/Cargo_cult is not the right analogy, so maybe you can point me to a better pattern that can pass information about such behaviour? |
@techtonik after a unzip into a random folder in pythonpath, the package should work and have the valid metadata for package discovery - this is python package metadata and not related to zip file metadata at all, zip files are merely used to compact and bundle, their own metadata is irrelevant BTW: consistent tolling is favorable over a convenient but inconsistent mess, |
@RonnyPfannschmidt without a real instruction "package discovery from unpacked wheels" for tool writers it starts to sound too abstract. You say that unpacking Maintaining RECORDS file is an additional hassle, so there should be not only a good reason to do so, but also a real example that it works and provides planned benefits (which are still not referenced). So, here is the new information - the primary use case for wheels is unpacking them manually into PYTHONPATH. Why it should not work for wheels without METADATA or RECORDS? So, what are you going to fix? Do you want to "fix" It looks like I am having a truck with wheels, but I can't ride EU roads, because my wheels are not branded. I can understand that my wheels should be certified if they can damage the road (digital signatures if you need security), but I don't understand why you insist on naming them to your liking. |
that comparisation is unjustified and misfit in the end not checking for correct wheels is a mistake in pip that should get fixed if you don't like how wheels should look, then don't generate them or change the standard, |
I propose the following checks:
These checks are not comprehensive - the idea is to ensure people aren't uploading corrupt wheels, not to act as a full validation suite. I'm honestly saddened that we need to do this. Deliberate violation of the wheel spec seems like the sort of bevaviour that should be fixable simply by letting the author of the invalid wheel know of their mistake. Having to defend against deliberate attempts to undermine the standardisation work going on is depressing. |
IIRC the wheel unpack command will check everything. Shame on pip for not bothering. The hashes exist so that you could a) verify that nothing has changed after an installation, potentially warning the user that they made local edits b) digitally sign RECORD, and for example have a digital signature that was still valid if you decided to change the compression algorithm used in the zip file. It is so trivial to generate the correct hashes I'm surprised it causes so much consternation. Write a tool for heavens sake. For example in the wheel tracker there's an issue to create a 'wheel pack' command that would archive a directory updating RECORD hashes. |
If you don't include METADATA then packaging tools may have trouble realizing that your package is installed and at what version. |
Please use arguments. Closing eyes looks like you don't want to deal with a problem that you solved for yourself. But if you can address the points, it may happen that it is me who is blind and don't see something. |
So it is
So it is completely optional. Why require it for non-signed wheels? |
But name and version is already present in wheel naming standard - https://www.python.org/dev/peps/pep-0427/#file-name-convention And when packaging tool installs wheels, it is it's own way how to record information about installed package - it may use SQLite, index files or OS-native database. In fact, I wish I could just unpack wheels in Linux without all that METADATA stuff. For development it is important to build wheels for CI without stumbling upon obligatory details like version keeping hassle, which makes sense only for releases. |
There are many such things around (like papers that are easy to fill), so a problem could be that some of these rituals are just don't make sense. |
Or that there are some people that don't see any meaning in those rituals. |
Ping? Is everyone fine with moving forward with @pfmoore's suggested checks? If not, what do you want more/less? |
i think so, the prior discussion made it clear that the python ecosystem needs to be protected from people that think its ok to breaks standards if it somehow works |
I'm OK with adding the checks, but I will point out that it seems to be only the OP who is proposing to ignore the standards. I think the important point here is #4705, which points out that pip isn't following the wheel spec in verifying that files are recorded in the If someone wants to implement the suggested checks, though, I'm OK with that. But I did say above that "I'd like to think that checking hashes is overkill". In fact the spec (https://www.python.org/dev/peps/pep-0427/#the-dist-info-directory item 8) says that installers should verify hashes, so I'm wrong on that. (The wording doesn't use normal standards "MUST/SHOULD" terminology, so it's not 100% clear whether checking is required or merely good practice, but it's clearly the intent of the spec that hashes get checked). |
Checking hashes is also just good general practice to catch accidental data corruption caused by bugs or hardware issues. |
The wording could have been clearer, but a little later in the PEP it says this:
And it gives a rationale:
|
@stonebig those things are entirely orthogonal as of now |
I don't like ecosystem that doesn't follow common sense, so my final comment here is to unsubscribe. There are many things besides Python wheels out there that don't need any If you want something modern for consistency - take a look at https://github.com/centrifuge/precise-proofs, build Python cloud with content addressing, repeatable builds and packages as catalogs. |
See:
https://github.com/techtonik/leanbook/blob/master/manuscript/07wheel-anatomy.md
https://github.com/techtonik/RHVoice/blob/d08c7244c4fd47daa6ff9576f8c9ca4491c87d6e/src/nvda-synthDriver/PACKING.md
which recommend uploading wheels to pypi in which the METADATA and RECORD files are 0 bytes long.
The text was updated successfully, but these errors were encountered: