Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add schema field and features2 to the index. #9161

Merged
merged 4 commits into from
Feb 22, 2021

Conversation

ehuss
Copy link
Contributor

@ehuss ehuss commented Feb 10, 2021

This adds a v field to the index which indicates a format version for an index entry. If Cargo encounters a version newer than it understands, it will ignore those entries. This makes it safer to make changes to the index entries (such as adding new things), and not need to worry about how older cargos will react to it. In particular, this will make it safer to run cargo update on older versions if we ever decide to add new things to the index.

Currently this is not written anywhere, and is intended as a safety guard for the future. For now I will leave it undocumented until we actually decide to start using it.

This also moves the new syntax for namespaced features and weak dependency features into a new field ("features2") in the index. This is necessary to avoid breaking Cargo versions older than 1.19, which fail to parse the index even if there is a Cargo.lock file.

It is intended that only crates.io will bother with creating this field. Other registries don't need to bother, since they generally don't support Cargo older than 1.19.

I'm uncertain exactly when we should try to update crates.io to start accepting this, as that is a somewhat permanent decision.

@rust-highfive
Copy link

r? @Eh2406

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 10, 2021
@ehuss
Copy link
Contributor Author

ehuss commented Feb 10, 2021

I'm glad I took some more time with this. The original implementation had the v check in the wrong place, which caused the on-disk index caches to not work correctly. This changes it from the original PR where the v check is done after the cache is created and loaded. The only way to test this was to switch between different versions of Cargo (see old_cargos, which has been enhanced).

@alexcrichton
Copy link
Member

To make sure I understand right, was the bug you caught here that an older Cargo would cache a list of index entries without v2 ones, but then a newer Cargo would reread the cache and not see the newer v2 entries? If so we may want to factor in the maximum understood version into the cache format and if you alternate back-and-forth Cargo versions it rewrites the cache?

Otherwise this looks good to me. I'd be happy to go ahead and land this and start to let this propagate. I do still think that crates.io will need to change, too. At the very least crates.io's feature validation needs to be updated for the new syntaxes.

@ehuss
Copy link
Contributor Author

ehuss commented Feb 11, 2021

The issue was that in #9111, the schema check was done before the summary was saved to the cache (here). If #9111 was merged, and these v2 entries were in the index, those would not be included in the cache. Then, if you ran with -Zfeatures-namespaced, those entries would not appear to exist because they weren't in the cache.

This also caused a problem with running with debug assertions. If the index had a v2 entry, and you ran an older cargo (say 1.49), that older cargo would place the v2 entry in the cache. Then, if you ran a cargo after #9111, that cargo would filter out the v2 entries, and then compare that against the cache, which won't match, and it panicked.

Here, the filtering is done after caching, so it should work with all versions, and avoids the debug-assertion.

I was thinking of backporting the first commit (adding the v field) to 1.51, just to get it released earlier. The second commit (adding features2) doesn't need to be backported because it is a nightly-only feature.

@alexcrichton
Copy link
Member

Ok looking at this again, to make sure I write it down again as well my main concern is protecting against what I believe is a preexisting bug in Cargo. This bug is where an old Cargo parses an index entry, creates a cache entry, and then a future version of Cargo reads the cache, skipping the index entirely. In this situation I think that any lines in the registry that the older version of Cargo failed to understand would be missing from the future Cargo's knowledge (since those entries not understood are not cached).

I think it might be good to perhaps go ahead and factor in the index version format into the cache entry header? That way usage of old/new Cargo would thrash but I'm not too worried about that, I'm mostly worried about correctness in that a newer version of Cargo should always see all the index entries.

@ehuss
Copy link
Contributor Author

ehuss commented Feb 18, 2021

Yea, that makes sense. To be clear, the concern is that one of these lines could return an error for whatever reason, and it would be good to protect against that?

Is the following patch roughly what you're thinking of?

diff --git a/src/cargo/sources/registry/index.rs b/src/cargo/sources/registry/index.rs
index 83850c592..11e6da265 100644
--- a/src/cargo/sources/registry/index.rs
+++ b/src/cargo/sources/registry/index.rs
@@ -68,7 +68,7 @@

 use crate::core::dependency::Dependency;
 use crate::core::{PackageId, SourceId, Summary};
-use crate::sources::registry::{RegistryData, RegistryPackage};
+use crate::sources::registry::{RegistryData, RegistryPackage, INDEX_V_MAX};
 use crate::util::interning::InternedString;
 use crate::util::paths;
 use crate::util::{internal, CargoResult, Config, Filesystem, ToSemver};
@@ -309,7 +309,7 @@ impl<'cfg> RegistryIndex<'cfg> {
         // necessary.
         let raw_data = &summaries.raw_data;
         let max_version = if namespaced_features || weak_dep_features {
-            2
+            INDEX_V_MAX
         } else {
             1
         };
@@ -678,7 +678,7 @@ impl Summaries {
 // sha lets us know when the file needs to be regenerated (it needs regeneration
 // whenever the index itself updates).

-const CURRENT_CACHE_VERSION: u8 = 1;
+const CURRENT_CACHE_VERSION: u8 = 1 + INDEX_V_MAX as u8;

 impl<'a> SummariesCache<'a> {
     fn parse(data: &'a [u8], last_index_update: &str) -> CargoResult<SummariesCache<'a>> {
diff --git a/src/cargo/sources/registry/mod.rs b/src/cargo/sources/registry/mod.rs
index c6e50bf53..d43a3d26d 100644
--- a/src/cargo/sources/registry/mod.rs
+++ b/src/cargo/sources/registry/mod.rs
@@ -250,6 +250,10 @@ pub struct RegistryConfig {
     pub api: Option<String>,
 }

+/// The maximum version of the `v` field in the index this version of cargo
+/// understands.
+pub(crate) const INDEX_V_MAX: u32 = 2;
+
 /// A single line in the index representing a single version of a package.
 #[derive(Deserialize)]
 pub struct RegistryPackage<'a> {

Or, what do you think of changing it so that if IndexSummary::parse ever fails, it just doesn't save to the cache, but otherwise proceeds as best it can?

@alexcrichton
Copy link
Member

Oh I'm not worried about one of those lines failing, more specifically what I'm worried about is the below sequence of events. To reiterate though this is a preexisting bug and not the fault of this PR, it's something we ideally should have fixed long ago!

  1. Cargo version OLD calls Summaries::parse.
  2. Cargo version OLD sees no cache. It finds a few lines in the index it cannot parse, leading to ignoring those errors
  3. Cargo version OLD persists the cache missing entries from the index.
  4. Cargo version NEW later calls Summaries::parse
  5. Cargo version NEW sees a cache file and ignores the actual entries.

My worry is that cargo NEW fails to see the entries that OLD failed to parse. Nothing failed here and nothing was corrupted, it was just unfortunate that the OLD version ran first and cached bad results.

@alexcrichton
Copy link
Member

As for a fix I think what you gisted would fix things, but I think it's probably best to have a leading version byte followed by a maximum understood version byte. I'm a little wary of having the sum, not that I can think of anything concretely going wrong but it feels better to have them separate.

This is intended to help prevent the following scenario from happening:

1. Old cargo builds an index cache.
2. Old cargo finds an index entry it cannot parse, skipping it,
   and saving the cache without the entry.
3. New cargo loads the cache with the missing entry, and never sees
   the new entries that it understands.

This may result in more cache thrashing, but that seems better than
having new cargos missing entries.
@ehuss
Copy link
Contributor Author

ehuss commented Feb 19, 2021

Ok, I pushed a commit that adds the version field to the cache. I still feel like the code is brittle (I added some warning comments to that effect).

I was thinking that if IndexSummary::parse fails, that it could maybe just not write the cache, but that could permanently lock out older cargos from caching.

.get(..4)
.ok_or_else(|| anyhow::anyhow!("cache expected 4 bytes for index version"))?;
let index_v = u32::from_le_bytes(index_v_bytes.try_into().unwrap());
if index_v > INDEX_V_MAX {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this check may actually want to be != instead of > perhaps? If Cargo version 3 reads a cache file from Cargo version 2, I think that's the case I was worried about? (where version 2 didn't cache any version 3 entries in this summary file)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes... 😡 😡 sorry, been making several elementary mistakes.

@alexcrichton
Copy link
Member

Nah I definitely agree this is a bit brittle. One possibility would be to, while testing Cargo itself, we assert there are no errors maybe? Not that that really adds a whole ton of confidence...

The whole point is that when loading a cache from an earlier version,
that cache may be missing new entries. So don't load older caches.
@ehuss
Copy link
Contributor Author

ehuss commented Feb 20, 2021

One possibility would be to, while testing Cargo itself, we assert there are no errors maybe?

I don't think the testsuite will discover much, since it doesn't test older formats. There is a debug assert that causes it to compare the cache to loading it, which is how I got started down this quest. But that only triggered when I was manually running older cargos with -Znamespaced-features, and part of the reason I'm adding the old_cargos tests.

@alexcrichton
Copy link
Member

@bors: r+

@bors
Copy link
Contributor

bors commented Feb 22, 2021

📌 Commit f258569 has been approved by alexcrichton

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 22, 2021
@bors
Copy link
Contributor

bors commented Feb 22, 2021

⌛ Testing commit f258569 with merge 7442c14...

@bors
Copy link
Contributor

bors commented Feb 22, 2021

☀️ Test successful - checks-actions
Approved by: alexcrichton
Pushing 7442c14 to master...

@bors bors merged commit 7442c14 into rust-lang:master Feb 22, 2021
@ehuss ehuss mentioned this pull request Feb 22, 2021
bors added a commit that referenced this pull request Feb 22, 2021
[beta] backports for 1.51

Beta backports for the following:

* Fix panic with doc collision orphan. (#9142)
    * This is an important regression that is fairly easy to hit.
* Do not exit prematurely if anything failed installing. (#9185)
    * This is not a regression, but I think an important fix.
* Add schema field to the index (#9161)
    * This is only the first commit from the PR which checks for the `v` field in the index, and skips entries that are not understood. The reason to backport is to get this in as early as possible so that if we do decide to start using it in the future, it works as early as possible.  This otherwise doesn't do anything, so I think it should be safe.
* Fix warnings of the new non_fmt_panic lint (#9148)
    * Fixes CI for a new warning in nightly.
bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 24, 2021
[beta] Update cargo

Backport of rust-lang/cargo#9196:

* Fix panic with doc collision orphan. (rust-lang/cargo#9142)
* Do not exit prematurely if anything failed installing. (rust-lang/cargo#9185)
* Add schema field to the index (rust-lang/cargo#9161)
* Fix warnings of the new non_fmt_panic lint (rust-lang/cargo#9148)
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this pull request Feb 27, 2021
Update cargo

11 commits in bf5a5d5e5d3ae842a63bfce6d070dfd438cf6070..572e201536dc2e4920346e28037b63c0f4d88b3c
2021-02-18 15:49:14 +0000 to 2021-02-24 16:51:20 +0000
- Pass the error message format to rustdoc (rust-lang/cargo#9128)
- Fix test target_in_environment_contains_lower_case (rust-lang/cargo#9203)
- Fix hang on broken stderr. (rust-lang/cargo#9201)
- Make it more clear which module is being tested when running cargo test (rust-lang/cargo#9195)
- Updates to edition handling. (rust-lang/cargo#9184)
- Add --cfg and --rustc-cfg flags to output compiler configuration (rust-lang/cargo#9002)
- Run rustdoc doctests relative to the workspace (rust-lang/cargo#9105)
- Add support for [env] section in .cargo/config.toml (rust-lang/cargo#9175)
- Add schema field and `features2` to the index. (rust-lang/cargo#9161)
- Document the default location where cargo install emitting build artifacts (rust-lang/cargo#9189)
- Do not exit prematurely if anything failed installing. (rust-lang/cargo#9185)
@ehuss ehuss added this to the 1.52.0 milestone Feb 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants