Checksum mismatches on .tar.gz files #45830
-
Select Topic AreaBug BodyOur builds are breaking because sha256sums of downloaded In an older discussion:
So it could be changes on the GitHub side. But I'd expect to see an announcement it's going to happen. Edit: But per @SanjayVas's link, .tar.gz with URLs under Edit 2: Official response
Edit 3: from @vtbassmatt:
E.g. one error (from Bazel):
The new checksum matches what I get when downloading manually: $ curl -sS https://github.com/AprilRobotics/apriltag/archive/refs/tags/v3.2.0.tar.gz | sha256sum
3ce5fae0355961a0be846363ce6b6b394b7e179f8ee5354907a47c8764f40639 - |
Beta Was this translation helpful? Give feedback.
Replies: 32 comments 145 replies
-
According to bazel-contrib/SIG-rules-authors#11 (comment), source archives for tags are supposed to be stable. Therefore, this is a GitHub bug. |
Beta Was this translation helpful? Give feedback.
-
WE are having the same problem, please can someone get some eyes on this? |
Beta Was this translation helpful? Give feedback.
-
this is a huge problem for us (https://github.com/envoyproxy/envoy) and for users of many packaging system (bazel, pip eg) |
Beta Was this translation helpful? Give feedback.
-
From our tests we found only folder permissions to be changed: some folders now have group write added (for a file from Feb 25, 2021 |
Beta Was this translation helpful? Give feedback.
-
This looks similar to #8149 @gudmundur provided some good feedback last time, maybe they can help again? |
Beta Was this translation helpful? Give feedback.
-
I wouldn't hope for it, they have been at Vercel for a year. |
Beta Was this translation helpful? Give feedback.
-
And just so it's recorded, this also breaks Homebrew install from source:
|
Beta Was this translation helpful? Give feedback.
-
There is a comment from a GitHub engineer and related discussion here: bazel-contrib/SIG-rules-authors#11 (comment) |
Beta Was this translation helpful? Give feedback.
-
A reply from GitHub (@bk2204):
|
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
My co-worker pointed me to a blog... https://github.blog/changelog/2023-01-30-git-archive-checksums-may-change/ |
Beta Was this translation helpful? Give feedback.
-
Our bazel build also broke today because of a change in the SHA256 of com_github_googleapis_google_cloud_cpp. |
Beta Was this translation helpful? Give feedback.
-
Majority of tensorflow/* repositories won't build because of chain of external dependencies. |
Beta Was this translation helpful? Give feedback.
-
This had a massively wide impact, ConanCenter https://conan.io/center/ is heavy impacted as well. |
Beta Was this translation helpful? Give feedback.
-
surely there is someone at github/microsoft with some understanding of the impact of this that is able to draw a line under it if not i think they will be discussing this in board meeting well after most devs have forgotten about it |
Beta Was this translation helpful? Give feedback.
-
Hey folks. I'm the product manager for Git at GitHub. We're sorry for the breakage, we're reverting the change, and we'll communicate better about such changes in the future (including timelines). |
Beta Was this translation helpful? Give feedback.
-
Are you going to post status on the rollback here or on the bazel-contrib issue? bazel-contrib/SIG-rules-authors#11 |
Beta Was this translation helpful? Give feedback.
-
I have been thinking about this problem for a while, "safe and comfortable in the knowledge that it will never break". 🤣 So one good output of this actually occurring and then being reverted: I have gone ahead and actually posted an email to the git mailing list about the possible solution I've been thinking of for a while now: https://public-inbox.org/git/[email protected]/T/ I live in hope that we'll eventually see a world where the manpage for |
Beta Was this translation helpful? Give feedback.
-
This may not be the right place to say it, but I am going to just in case anyone is curious about how to solve this issue in the long term. This change didn't break Nix, because Nix computes a NAR Hash. It doesn't hash the tarball itself, because that is unreliable. It hashes the result of extracting it. The NAR Hash is the recursive hash of the directory after it has been extracted. This NAR Hash is then what is used to create lockfiles, and is what is relied upon and stored, rather than the hash of the actual tarball itself, which is sensitive to input library changes, such as the zlib compression library. You can use functions in the Nix expression language to fetch tarballs, without their hash effecting builds, as long as their extracted content remains the same.
Here is a relevant twitter thread, and it would be helpful if more developers knew not to rely on Tarball hashes, but only of the post-extraction content https://twitter.com/MatthewCroughan/status/1620204622639149056 PS: I made the same comment here https://github.com/orgs/community/discussions/45830#discussioncomment-4824814, but am re-posting it here just in case it gets lost in the GitHub thread timeline and is automatically hidden in future. |
Beta Was this translation helpful? Give feedback.
-
My test library just reverted to the correct hash! Thanks to everyone at GitHub for your hard work getting this reverted. $ curl -sS https://github.com/AprilRobotics/apriltag/archive/refs/tags/v3.2.0.tar.gz | sha256sum
111a93a5315f8b8c2a36fa911403236032a819e8f50e8845548fe2d7dd1e5db5 - |
Beta Was this translation helpful? Give feedback.
-
Seeing some working hashes on the other thread as well: bazel-contrib/SIG-rules-authors#11 (comment) |
Beta Was this translation helpful? Give feedback.
-
Just fixed some hash mismatches from today, hopefully this issue is now fixed (fingers crossed)
We use hashes in buildroot to check package integrity so these changing breaks the build: https://github.com/skiffos/SkiffOS/actions/runs/4050223636/jobs/6967405410 |
Beta Was this translation helpful? Give feedback.
-
@vtbassmatt: thanks for the quick fix on this, and for (hopefully) advance notice of future changes. Many of us here represent packaging ecosystems, and many of us are still confused about the guarantees provided by release archives on GitHub. Would it make sense to start some sort of working group? It would be nice if more than just the Bazel folks were looped in on decisions about stable release hashes and other software supply chain issues. On this thread alone, I see:
As @rsc mentioned on HN, this is an opportunity for GitHub to take the lead on supply chain security. This isn’t the first hash change I’ve seen on GitHub, but I remember the last one, and it didn’t cause this much uproar. There was no revert. We just quietly re-hashed everything. I think the difference is indicative of how important stable release artifacts have become to many (all?) software communities. They’re at the bottom of every stack. I’m sure a lot of these folks would be eager to be in closer discussions with GitHub, and I think you’d get valuable feedback on potential impacts in advance. Thanks again. |
Beta Was this translation helpful? Give feedback.
-
Thanks @rossburton for openembedded/openembedded-core@21f84fc (https://git.openembedded.org/openembedded-core/commit/?id=21f84fcdd659544437fe393285c407e1e9432043) any myself for persuading him, that the archives really aren't guaranteed to be identical, before official confirmation from github :). |
Beta Was this translation helpful? Give feedback.
-
I've just had a thought. When GitHub do update the hashing for better compression, everyone relying on the tar hash will update their hashes. This is the ultimate opportunity to change the tar contents, effect the supply chain, introduce vulnerabilities, and have everyone trust you. Something like Nix which computes the NAR Hash (the result of the tar contents) will not be effected by this, since it only cares about the content. I think this is much better than worrying about an unlikely tar vulnerability. In a system that only trusts the tar hashes, the original source is not able to take advantage of better compression over time, without massive risk of supply chain attack. If you think you can hand me a tarball that can run arbitrary code, for any version of tar that has ever existed, please give it to me so I can experiment with exploits, and I'll buy you a drink of your choice at FOSDEM if you're there! |
Beta Was this translation helpful? Give feedback.
-
I wonder whether something like an "official hashing script" would be a viable option. Something that GitHub distributes and that looks something like
While this may be somewhat expensive to execute, I could at least get the git repo and the compression tool using arbitrary third-party channels. With this I wouldn't have to decompress. Also, this is only an issue if hashes actually change. This would also let GitHub change hashes/compression since it gives users a way to re-verify archives for themselves if hashes change. |
Beta Was this translation helpful? Give feedback.
-
I am thinking that neither "relying on stable checksums", nor "computing the checksum of the content" is a good solution, both for different reasons. I think Github should offer programmatic a way to retrieve the expect checksum of a tarball as produced now, (maybe as well as checksums of those tarballs as produced before?). This would be similar to DNSSEC/SSHFP. Tools could validate the checksum from the source, rather than relying for on it not to change or extracting the content of an unverified tarball. |
Beta Was this translation helpful? Give feedback.
-
IMHO, as solution for future, checksumming tar before compressing will eliminate dependency on compressor (gzip, zstd, xz...). But it's also not backward compatible. |
Beta Was this translation helpful? Give feedback.
-
is this happening again? https://github.com/grpc/grpc/blob/master/bazel/grpc_deps.bzl#L496 wants sha256 checksum to be
|
Beta Was this translation helpful? Give feedback.
-
I have arrived here debugging mysterious and scary build failures on my end caused by SHA512SUM mismatch on libsodium archive. As of today, building my application fails with a SHA512SUM mismatch on libsodium. My application depends libsodium via vcpkg at this revision: microsoft/vcpkg@7a7ef70#diff-733376a37964dc6dd16ac7d31b52f2ea6c48f84099fbee0839c7c5568b814127 |
Beta Was this translation helpful? Give feedback.
Hey folks. I'm the product manager for Git at GitHub. We're sorry for the breakage, we're reverting the change, and we'll communicate better about such changes in the future (including timelines).