Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expected behavior if a crate version's checksum changes #10071

Open
jonhoo opened this issue Nov 12, 2021 · 3 comments
Open

Expected behavior if a crate version's checksum changes #10071

jonhoo opened this issue Nov 12, 2021 · 3 comments
Labels
A-dependency-resolution Area: dependency resolution and the resolver A-diagnostics Area: Error and warning messages generated by Cargo itself. C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` S-triage Status: This issue is waiting on initial triage.

Comments

@jonhoo
Copy link
Contributor

jonhoo commented Nov 12, 2021

Problem

It's not clear if this is a bug, a feature request, a bit of both, or not really either, but I felt like it'd be good to have an issue to discuss this in.

Cargo currently has a built-in assumption that the checksum of a given crate version will never change. This is, on the whole a good and helpful assumption. However, the fact that it is assumed and not checked leads to some peculiar behavior and workarounds if that assumption is ever violated (I'll give some reasons for why that may happen in the "Notes" section). I think it'd be valuable for Cargo to yell and scream if it detects a checksum mismatch, but then if explicitly told to adapt to the changed checksum, for Cargo to "do the right thing". Essentially this would be adding some "robustness in depth". That is not currently the case, as this sequence of operations highlights (I left it as a script so it's easier to replicate):

#!/bin/bash

# Fail if anything unexpected fails
set -euo pipefail

rm -rf cargo-src-replace
mkdir cargo-src-replace
cd cargo-src-replace

# Don't pollute our actual CARGO_HOME
export CARGO_HOME=$PWD/home/.cargo

# Create our own registry.
mkdir registry
git -C registry init
cat > registry/config.json <<EOF
{"dl":"file://$PWD/dl","api":"file://$PWD/api"}
EOF
git -C registry add .
git -C registry commit -m "Add config.json"

# Use that registry over crates.io
mkdir -p home/.cargo
cat > home/.cargo/config <<EOF
[source.crates-io]
replace-with = 'dummy-registry'
[source.dummy-registry]
registry = 'file://$PWD/registry'
EOF

# "Publish" bar-0.1.0
cargo new --lib bar
pushd bar
echo 'pub const VALUE: &str = "hello";' > src/lib.rs
cargo package --allow-dirty
popd
mkdir -p dl/bar/0.1.0
crate=bar/target/package/bar-0.1.0.crate
sha=$(sha256sum "$crate" | awk '{print $1}')
mv $crate dl/bar/0.1.0/download
mkdir -p registry/3/b/
cat > registry/3/b/bar <<EOF
{"cksum":"$sha","deps":[],"features":{},"links":null,"name":"bar","vers":"0.1.0","yanked":false}
EOF
git -C registry add .
git -C registry commit -m "Add bar for the first time"

# Make a crate "foo" that depends on bar-0.1.0
cargo new foo
pushd foo
echo 'bar = "0.1.0"' >> Cargo.toml
echo 'fn main() { println!("{}", bar::VALUE); }' > src/main.rs
# Sanity check that it runs as expected
cargo run | grep hello
popd

# Now make the registry hold a _different_ artifact for bar-0.1.0
pushd bar
# Make VALUE different so we can tell which version foo is actually using
echo 'pub const VALUE: &str = "world";' > src/lib.rs
cargo package --allow-dirty
popd
crate=bar/target/package/bar-0.1.0.crate
sha1="$sha"
sha=$(sha256sum "$crate" | awk '{print $1}')
mv $crate dl/bar/0.1.0/download
cat > registry/3/b/bar <<EOF
{"cksum":"$sha","deps":[],"features":{},"links":null,"name":"bar","vers":"0.1.0","yanked":false}
EOF
git -C registry add .
git -C registry commit -m "Add bar for the second time"

# Back to foo to see how it handles this registry change
pushd foo
# Running foo is still fine -- it hasn't pulled the registry update
cargo run | grep hello
# But a cargo update will reveal that something is off
if cargo update --dry-run; then
	# NOTE: This construct (and the others like it) is so the script checks expected failures
	echo "expected cargo update to fail with lock checksum mismatch"
	exit 1
fi
# And now a run will fail too with the same checksum error
if cargo build; then
	echo "expected foo build to fail with lock checksum mismatch"
	exit 1
fi

# Time to venture into the land of dragons.
# What if we `cargo update -p bar` to tell Cargo to go ahead and update bar anyway?
# That (currently) doesn't work -- Cargo will not let you get past this without blowing away your Cargo.lock.
if cargo update -p bar; then
	echo "cargo update -p bar now lets you override a checksum mismatch!"
	exit 1
fi

# Okay, but what if we _really_ want this to work and manually fix Cargo.lock?
sed -i.bkp "s/$sha1/$sha/" Cargo.lock
# That "works" (in that Cargo builds again), but it's still using the old bar.
# This is... surprising, because the lockfile now indicates the new hash, but
# the _old_ source is clearly still being used!
if ! cargo run | grep hello; then
	echo "cargo run now picks up new source after checksum override!"
	exit 1
fi

# Let's try harder to make Cargo do the right thing.
# Clean the build artifacts! That'll do it.
cargo clean -p bar
# Except, no, because whil this will cause Cargo to recompile, it'll recompile
# based on the _cached_ source in $CARGO_HOME/src/, which is stale.
if ! cargo run | grep hello; then
	echo "cargo run now picks up new source after checksum override + clean!"
	exit 1
fi

# What if we try _even_ harder and clean the source cache too?
cargo clean -p bar
rm -r ../home/.cargo/registry/src
# Sadly, no, that's not enough either, since Cargo _also_ caches the .crate!
if ! cargo run | grep hello; then
	echo "cargo run now picks up new source after checksum override + clean + rm src/!"
	exit 1
fi

# Okay, so let's delete that too
# (https://github.com/rust-lang/cargo/pull/10070 would make this nicer)
cargo clean -p bar
rm -r ../home/.cargo/registry/src
rm -r ../home/.cargo/registry/cache
# Yay, finally!
cargo run | grep world

echo "SUCCESS?!"

Proposed Solution

I think the way to solve this is for Cargo to use the checksum along with the version anywhere the version is currently used to uniquely identify a crate version. That is, at least:

  • As an input into fingerprints.
  • In the filenames for .crate files in $CARGO_HOME/cache/
  • In the filenames for extracted source directories in $CARGO_HOME/src/

There may also be other places where the assumption that version implies checksum that I don't know of and that'll also have to be updated. If it's only the three above, this may be a relatively simple fix (🤞).

With these changes, if the lockfile is fixed, Cargo should correctly re-fetch and re-compile anything that needs to be, and thus produce the expected result without further fiddling.

It may also be helpful to allow cargo update -p (maybe only with --precise?) to fix a checksum mismatch rather than requiring an impacted user to blow away or manually edit their Cargo.lock.

Notes

Okay, so, why would this come up in the first place? Aside from the corrupted lockfile case and situations like #7180 (comment), my use-case for this is a somewhat convoluted build setup in which a larger build system invokes Cargo. This larger build system may re-build a crate without human intervention if, say, a non-Cargo dependency changes, but such a re-build could cause that crate's checksum to change if it, say, runs bindgen in a build script based on a C header file vended by those non-Cargo dependencies. The larger build system attempts to auto-generate Cargo versions so that such re-builds end up with separate version numbers to avoid exactly this problem, but it has no reliable and scalable way to ensure that the generated version number is distinct from all other generated version numbers. It could, in theory, pick random version numbers (in the patch position), or extract a u64 from the checksum, but that would lose out on the attractive "higher is newer" property of version numbers, and may lead to picking an older version of a dependency, which is obviously not good.

Now, this is not to say that my use-case is common, a supported/correct use of Cargo, or even at all sane, but I do think that it highlights a way in which Cargo can be made more robust to more use-cases than it currently is.

@jonhoo jonhoo added the C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` label Nov 12, 2021
@jonhoo
Copy link
Contributor Author

jonhoo commented Nov 12, 2021

Just for the sake of searchability, the error that Cargo produces in these cases is:

error: checksum for `bar v0.1.0` changed between lock files

this could be indicative of a few possible errors:

    * the lock file is corrupt
    * a replacement source in use (e.g., a mirror) returned a different checksum
    * the source itself may be corrupt in one way or another

unable to verify that `bar v0.1.0` is the same as when the lockfile was generated

@jonhoo
Copy link
Contributor Author

jonhoo commented Nov 12, 2021

And the origin of this check (although not the "Cargo does the wrong thing if the lockfile is changed" issue) is

// Given a previous instance of resolve, it should be forbidden to ever
// have a checksums which *differ*. If the same package ID has differing
// checksums, then something has gone wrong such as:
//
// * Something got seriously corrupted
// * A "mirror" isn't actually a mirror as some changes were made
// * A replacement source wasn't actually a replacement, some changes
// were made
//
// In all of these cases, we want to report an error to indicate that
// something is awry. Normal execution (esp just using crates.io) should
// never run into this.

@kornelski
Copy link
Contributor

kornelski commented Dec 7, 2024

The +suffix in semver can be used to change content of a crate without changing the part of the version that is used for dependency resolution. The suffix is ignored, so Cargo will pick whatever matches apart from the suffix. The suffix can be a checksum, so the version can be created statelessly.
A hash suffix won't have a natural order, but if you're re-generating a package, you could delete previous copies to ensure only the latest will be selected (and if you care about preserving previous package builds, you shouldn't be overwriting them in the first place!).

I've used the +suffix trick already in practice in a custom registry to patch crates that have been incorrectly uploaded, and it worked. I think having to change the +suffix is a decent compromise between having immutable releases and ability to update them. You still have a version that uniquely identifies the content, so there's never a confusion about which version of the version someone has built.

For security and integrity of registries, I think Cargo should always assume that releases (identified by the entire ver+suffix) are immutable. There are plans to have better signing of crates.io registry and packages, but in the meantime we can at least have a Trust-On-First-Use. I think Cargo could go even further and compare checksums of newly downloaded registry metadata against the cached file, and detect any checksums changing even without a lockfile.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-dependency-resolution Area: dependency resolution and the resolver A-diagnostics Area: Error and warning messages generated by Cargo itself. C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` S-triage Status: This issue is waiting on initial triage.
Projects
None yet
Development

No branches or pull requests

4 participants