-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize lock file format for git merge conflicts #7070
Conversation
r? @ehuss (rust_highfive has picked a reviewer for you, use r? to override) |
Ah meant to do this earlier, but here's a diff of the old vs new format for Cargo's own Cargo.lock |
e32bdc7
to
b3bcc7c
Compare
Nice! Do you think this will be solid enough that eventually rust-lang/rust can remove the Is there any way to test this from the command-line locally? I couldn't see a way to force it to generate a new lock file. |
I'd hope so yeah! I wanted to get this up as an idea before I completely forgot about it and it's something I've been meaning to do for a long time. A perfect analysis of this PR would scrape github for a set of Cargo.lock changes over time. We could then compare the diff to a diff as-if the lock file were encoded with this proposed encoding. In theory we could even objectively conclude that this produces small diffs and infer that the likelihood of a merge conflict goes down. For testing locally there's no configuration to generate the new format, just preservation of the new format if it's found. If you edit a To be clear as well, I'm in no rush to land this. I wouldn't mind taking more time to evaluate this (or rather finding time to properly evaluate this). Additionally it's probably worthwhile to at least try to brainstorm better encodings than the one I randomly invented here, as there's surely better things we could do too! |
I probably not the only one with code that depends on the lockfile's format. Maybe the lockfile could at some point get a |
Ok so I've done a bit of initial analysis of this. I walked the entire history of the rust-lang/rust repository, found all commits that changed
A historgram of the new diff size over the old diff size as a percent looks like Looks like there's a huge spike of basically creating the same sized diff. That seems to happen a lot in rust-lang/rust due to additions/removals of dependencies. The lock file is so big that it's mostly just dependency edges getting removed and the checksum section doesn't change much. Additionally rust-lang/rust has a lot of Now all of this is only a proxy really for the actual metric we care about here, which is reducing the likelihood that two modifications to |
How much of a concern is it that after a clean git merge, the resulting Cargo.lock is not "complete". That is, if you run with Here's an example. A workspace with members |
The same thing happens when there are multiple versions of a package. Say there are dependencies on These should still be functional, but will fail with I'm wondering if this will be pretty rare, at least on repos the size of rust-lang/rust? When a rust-lang/rust PR attempts to be tested with bors, it should fail within a couple minutes so it may not be too painful? |
FWIW, I simulated a bunch of Cargo.lock merge conflicts I've received on rust-lang/rust over the past year using the V2 format. None of them failed, and they are all valid and pass with |
An excellent question! I would personally say that this is not a concern, but with nuance. In general I think we want An example of this is that historically Cargo would return a parse error for a lock file that was semantically valid. For example if you had a dependency edge pointing to a nonexistent package Cargo would fail parsing that (because Cargo would never itself produce that). We ended up landing a PR though that basically swallowed the errors. The goal with it was to take as much valid information as possible from the lock file but if it doesn't all match up then we'll re-resolve anyway. There's a separate issue in this repository for we should literally handle lockfiles with git merge markers (think Overall I see this as a general extension of Cargo's "incremental updates" feature. If you get a bad git merge then Cargo should still accept the lock file and just automatically update it, but when doing so we shouldn't shake up the entire world. Instead we'll just try to extract as much that's valid as we can and progress with that. Does that make sense and sound reasonable?
I suspect it'll definitely come up over time, but yeah the failure should happen pretty quickly since it shows up as soon as you try to build anything basically.
To clarify, were these git merge conflicts reported by bors or by |
The initial list was from bors (using this query). Some of them are legitimate conflicts (that is, even without the |
b3bcc7c
to
5a2c2b2
Compare
Nice! And thanks for checking that! That plus what I was seeing makes me pretty confident in this PR, so in that sense I don't have much more investigation I'd like to do :) |
// have one version for this name. If we have more than one version | ||
// for the name then it's ambiguous which one we'd use. That | ||
// shouldn't ever actually happen but in theory bad git merges could | ||
// produce invalid lock files, so silently ignore these cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't there be (loud) error when merge creates malformed lockfile?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless you've got a compelling reason to do so, we've got historical rationle for not doing this. In general Cargo.lock
can be confusing for new users, especially if scary merge messages are happening. What's happening in this case is concurrent updates, and Cargo should handle that the way it does all other updates (aka edits to Cargo.toml
) which is to automatically, silently, and incrementally update Cargo.lock
.
I can recall two prior changes of Cargo.lock format:
The second change was backwards compatible. Older rustcs would only remove it but they'd still parse the Cargo.lock. Here, there was a 12 week period. The first change was not backwards compatible. There was a year long period. It was still too short for multiple people when the switch was eventually done. This change is, similar to the first change, not backwards compatible. I want to discuss the following options for handling the switch: Option A: waiting six months like suggested in the PR. Option B: waiting one year. This would be analogous to the Option C: using the powers of opt-ins. There seems to be a zero sum situation if the decision of the format is hardcoded into Cargo.lock between people who want backwards compatibility and people who want to avoid merge conflicts. One group wants it to be earlier, the other group wants it later, classical zero sum. The situation becomes positive sum with opt-ins. At the start, it would be opt-in for people who want to avoid merge conflicts, one year (or some other time period) later defaults switch and it's opt-in for people who want backwards compatibility. Both groups would gain most with option C: the people who don't want merge conflicts can easily switch immediately after it becomes stable (without having to patch their Cargo), while the people who want backwards compatibility can use the old format indefinitely. A strawman proposal for exposed syntax is I'd personally prefer option C over option B over option A. A variant of option C would be to tie everything to editions but I doubt that the people who don't want merge conflicts want to wait three years :). |
Is it correct that this will remove the |
☔ The latest upstream changes (presumably #7127) made this pull request unmergeable. Please resolve the merge conflicts. |
5a2c2b2
to
7129b5e
Compare
I agree with the earlier comment that it'd be nice to have a |
7129b5e
to
cfb0792
Compare
Ok we discussed this at the cargo triage meeting today and the conclusion was that there's no major blockers that came up. I've opened a separate issue about versioning and version numbers, which we decided is orthogonal from this PR it doesn't need to block this PR. r? @ehuss for a final review of the bits and pieces? |
@alexcrichton can you Do we need to expedite the creation of a rustfmt bot? 😄 Just beware, there's an issue with nightly rustfmt treating some zalgo text differently from stable (in |
This commit is an attempt to refine Cargo's lock file format to generate less git merge conflicts for lock file updates as well as make it easier to manage lock file updates. The new lock file format has a few major changes: * The `[metadata]` table is no longer used to track checksums. The packages themselves now list `checksum` fields directly. * The entries in the `dependencies` key no longer unconditionally mention the version/source of the dependency. When unambiguous only the name or only the name/version are mentioned. As mentioned before the goal here is to reduce git merge conflict likelihood between two cargo updates to a lock file. By not using `[metadata]` the updates to package checksums should only happen on the package itself, not in a shared metadata table where it's easy to conflict with other updates. Additionally by making `dependencies` entries shorter it's hoped that updating a crate will only either add entries to a lock file or update lines related to just that package. It's theorized that the amount of updates necessary to a lock file are far less than today where the version has to be updated in many locations. As with all lock file format changes this new format is not turned on by default. Support is added so Cargo will preserve it if it sees it (and tests are added too), and then some time down the road we can flip the switch and turn this on by default.
This new format is designed to cause less git merge conflicts for lock file updates. See <rust-lang/cargo#7070>.
This new format is designed to cause less git merge conflicts for lock file updates. See <rust-lang/cargo#7070>.
In rust-lang/cargo#7070 the lockfile format was updated, instead of a top-level `[metadata]` key that holds the checksums for dependencies, these checksums are now specified in-line with the dependency. Since we shell out to `cargo` for most things, and in our tests only explicitly check for the presence of the `sha` values, this change is fairly minimal. I've opted to remove the checksums in most of our fixtures, as they are not strictly needed in our tests. I could have left them in there and things would still pass (`cargo` will silently upgrade them to the new format), however it seemed noisy.
Improves upon #16420. @huonw wisely suggested we avoid using `hash` for constraints files because it makes Git diffs and merge conflicts much worse. python-poetry/poetry#496 rust-lang/cargo#7070 [ci skip-rust] [ci skip-build-wheels]
…sbuild#16469) Improves upon pantsbuild#16420. @huonw wisely suggested we avoid using `hash` for constraints files because it makes Git diffs and merge conflicts much worse. python-poetry/poetry#496 rust-lang/cargo#7070 [ci skip-rust] [ci skip-build-wheels]
This commit is an attempt to refine Cargo's lock file format to generate
less git merge conflicts for lock file updates as well as make it easier
to manage lock file updates. The new lock file format has a few major changes:
The
[metadata]
table is no longer used to track checksums. Thepackages themselves now list
checksum
fields directly.The entries in the
dependencies
key no longer unconditionallymention the version/source of the dependency. When unambiguous only
the name or only the name/version are mentioned.
As mentioned before the goal here is to reduce git merge conflict
likelihood between two cargo updates to a lock file. By not using
[metadata]
the updates to package checksums should only happen on thepackage itself, not in a shared metadata table where it's easy to
conflict with other updates. Additionally by making
dependencies
entries shorter it's hoped that updating a crate will only either add
entries to a lock file or update lines related to just that package.
It's theorized that the amount of updates necessary to a lock file are
far less than today where the version has to be updated in many
locations.
As with all lock file format changes this new format is not turned on by
default. Support is added so Cargo will preserve it if it sees it (and
tests are added too), and then some time down the road we can flip the
switch and turn this on by default.