Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't hard-code essential rustdoc files #1312

Merged
merged 2 commits into from
Mar 21, 2021
Merged

Conversation

jyn514
Copy link
Member

@jyn514 jyn514 commented Mar 10, 2021

This avoids having to constantly update docs.rs when new files are
added.

Closes #1302, closes #1240, closes #294.
r? @Nemo157

@jyn514 jyn514 added S-waiting-on-review Status: This pull request has been implemented and needs to be reviewed A-builds Area: Building the documentation for a crate labels Mar 10, 2021
@jyn514
Copy link
Member Author

jyn514 commented Mar 10, 2021

Example build:
image
Generated with rustc 1.52.0-nightly (3a5d45f68 2021-03-09).

@syphar
Copy link
Member

syphar commented Mar 10, 2021

this is awesome :)

@Nemo157
Copy link
Member

Nemo157 commented Mar 13, 2021

Code changes look good to me, running a local build to confirm it (though I guess we have actual tests covering this code path since #1309 now 🥳).

@Nemo157
Copy link
Member

Nemo157 commented Mar 13, 2021

So, fun bug, this breaks searching because the "essential files" version of search-index.js is preferred over the local one. Can see the effect in https://docs.rs.dev.nemo157.com/bs58/0.4.0/search-index-20210312-1.52.0-nightly-b3e19a221.js returning the search index for empty_library (IPv6 only).

I assume the way docs.rs handles these versioned-essential-files as existing globally is a workaround for before --static-root-path was implemented.

@jyn514
Copy link
Member Author

jyn514 commented Mar 13, 2021

So, fun bug, this breaks searching because the "essential files" version of search-index.js is preferred over the local one. Can see the effect in https://docs.rs.dev.nemo157.com/bs58/0.4.0/search-index-20210312-1.52.0-nightly-b3e19a221.js returning the search index for empty_library (IPv6 only).

This seems like the same rustdoc bug as rust-lang/rust#83094 - I don't think rustdoc should be using --static-root-path for search-index.js.

@jyn514
Copy link
Member Author

jyn514 commented Mar 13, 2021

Hmm actually that might be necessary. rustdoc uses root_path to load it:

$ rg search-index src/librustdoc/
src/librustdoc/html/layout.rs
117:       data-search-js=\"{root_path}search-index{suffix}.js\"></div>

but writes it to the shared file cache: https://github.com/rust-lang/rust/blob/e7e1dc158c3de232750b568163f6941a184ee8be/src/librustdoc/html/render/write_shared.rs#L381. I think that's because it re-uses the search index for all crates documented at the same time. I guess we could special-case it in docs.rs?

@Nemo157
Copy link
Member

Nemo157 commented Mar 13, 2021

I wonder whether rustdoc shouldn't be applying the suffix to it? What is the purpose of putting a per-toolchain suffix on a dynamically generated file?

@jyn514
Copy link
Member Author

jyn514 commented Mar 13, 2021

I wonder whether rustdoc shouldn't be applying the suffix to it? What is the purpose of putting a per-toolchain suffix on a dynamically generated file?

@GuillaumeGomez do you know why rustdoc uses a resource suffix for the search index?

@GuillaumeGomez
Copy link
Member

Because it allows you to have multiple search-index files in the same location I assume? I remember that it was done for docs.rs, but I don't remember why at all...

@jyn514
Copy link
Member Author

jyn514 commented Mar 13, 2021

Well, docs.rs isn't using it, so if that's the only reason I think we should remove the suffix.

@GuillaumeGomez
Copy link
Member

Let's make sure no one is actually using it first. I'll ask that on twitter and users.rust-lang.org.

@GuillaumeGomez
Copy link
Member

I think that's because it re-uses the search index for all crates documented at the same time.

I didn't understand what you meant with this. rustdoc uses only one search-index.js, one source-files.js and one crates.js and update them if they already exists and always has been. Is it an issue for docs.rs?

@Nemo157
Copy link
Member

Nemo157 commented Mar 13, 2021

The issue here is that the existence of the suffix is used to detect the "essential files" that are shared between all builds, so because it generates search-index-20210312-1.52.0-nightly-b3e19a221.js that file is picked up from the initial empty-library build and stored for the toolchain. When the browser then does a request for https://docs.rs/<crate>/<version>/search-index-20210312-1.52.0-nightly-b3e19a221.js to get the search index for that doc-build, the "essential file" from empty-library is then served instead (it detects the presence of what looks like a rustdoc version suffix on the url and then serves the essential file, ignoring the subdirectory that file is in, before this PR we don't store the search-index so that is a 404 and it then fallsback to serving the correct build-specific file).

@Nemo157
Copy link
Member

Nemo157 commented Mar 13, 2021

Big alternative: Can we just stop storing the essential files? It appears that we're saving the per-build versions of all these files anyway, so if we stop passing --static-root-path etc. it seems like it would Just Work:tm:. EDIT: Nope, I just hit the fact that the files are available in any directory when testing this, we don't actually store them all for each build, just a somewhat random subset of recently added ones.

The main downside I see is that would probably affect browser caching, but given the likelihood of different docs you're looking at being built from the same nightly that doesn't seem like a big downside. Also if rustdoc started emitting subresource integrity hashes those would match cross-build for the static files and might be used by browsers as a cache key.

@Nemo157
Copy link
Member

Nemo157 commented Mar 13, 2021

// Avoid copying common files
let dup_regex = Regex::new(
r"(\.lock|\.txt|\.woff|\.svg|\.css|main-.*\.css|main-.*\.js|normalize-.*\.js|rustdoc-.*\.css|storage-.*\.js|theme-.*\.js)$")
.unwrap();

This also seems like it should be updated to take into account the known shared static files, instead of being a hardcoded regex (e.g. it's copying the .woff2 fonts, and the .png favicons).

@jyn514
Copy link
Member Author

jyn514 commented Mar 14, 2021

Big alternative: Can we just stop storing the essential files?

I'd be fine with this.

@GuillaumeGomez
Copy link
Member

That would work too I guess. Just a question: if the file already exists, wouldn't it cost money to copy it again? I guess we can prevent it by checking if the file already exists?

@jyn514
Copy link
Member Author

jyn514 commented Mar 18, 2021

Big alternative: Can we just stop storing the essential files? It appears that we're saving the per-build versions of all these files anyway, so if we stop passing --static-root-path etc. it seems like it would Just Worktm. EDIT: Nope, I just hit the fact that the files are available in any directory when testing this, we don't actually store them all for each build, just a somewhat random subset of recently added ones.

Can we hold off on this for now and just fix the essential files to be tracked properly? I think the search-index thing could be fixed by special casing that file not to be copied.

@syphar
Copy link
Member

syphar commented Mar 21, 2021

The main downside I see is that would probably affect browser caching, but given the likelihood of different docs you're looking at being built from the same nightly that doesn't seem like a big downside.

Not only browser-caching, but also CDN caching for us europeans :) But your point stands: the cache is only by nightly version.

Also if rustdoc started emitting subresource integrity hashes those would match cross-build for the static files and might be used by browsers as a cache key.

That would IMHO be the best way forward. Then we could really cache the static files across a longer timespan in the CDN. In the end, we do host many docs, and they do share static files to an extend. So I would keep the shared-resource logic and try to improve it.

@jyn514
Copy link
Member Author

jyn514 commented Mar 21, 2021

These are the files that are different between the ones we hard-coded and the ones rustdoc adds a resource-suffix to:

> diff hard-coded.txt rustdoc-generated.txt 
2a3
> crates.js
4a6,7
> favicon-16x16.png
> favicon-32x32.png
11a15
> search-index.js
13a18
> source-files.js

The new favicon files are correct; merging this will fix #1240. The crates.js and search-index.js files are not - I think it's a rustdoc bug that these files have a resource-suffix appended, because they can't be shared across crates. Here's source-files.js for empty_library:

var N = null;var sourcesIndex = {};
sourcesIndex["empty_library"] = {"name":"","files":["lib.rs"]};
createSourceSidebar();

so I expect it can't be shared across crates either. I'll see about fixing that in rustdoc.

@jyn514
Copy link
Member Author

jyn514 commented Mar 21, 2021

I think it's a rustdoc bug that these files have a resource-suffix appended, because they can't be shared across crates.

Yeah, rust-lang/rust#57011 explicitly says

I made sure to only change links for the static files, those that don't change between crates. Files like the search index, aliases, the source files listing, etc, are still linked with relative links.

@jyn514
Copy link
Member Author

jyn514 commented Mar 21, 2021

I think rust-lang/rust#59776 was incorrect, it means that files with a resource-suffix will be shared across crates even if they're incompatible. That links to #316 (comment) which says

That PR suggests to start adding the resource suffix to the search index, which will sidestep the issue by making newer builds request a different URL for it. That will solve the Rocket issue, and this specific problem of main.js and search-index.js using different versions of the index structure.
However, now that i'm typing this up here, i see that [using redirects on the docs.rs side] doesn't fix the Rocket problem, so maybe the complete solution is to do both this proposal and that PR.

I don't know what 'the rocket problem' was by reading that - I would expect it to be the same as whatever's going on with tokio in that issue? @QuietMisdreavus sorry to bother you - do you still remember what you meant back then?

I think either way it shouldn't matter because docs.rs never added those files to ESSENTIAL_FILES_VERSIONED, so whatever the rustdoc change did, it didn't have an impact until now.

@jyn514
Copy link
Member Author

jyn514 commented Mar 21, 2021

Also, it seems really strange to me that rustdoc would emit a file with a version suffix, but load it from root_path instead of static_root_path - I don't know why that would ever be useful, the version suffix is explicitly so you can share it across different crates. https://github.com/rust-lang/rust/blob/ed75d0686e1ab674636e784268b4b7f5e03c53de/src/librustdoc/html/layout.rs#L61

@Nemo157
Copy link
Member

Nemo157 commented Mar 21, 2021

My reading of the "rocket problem" is that it is essentially an issue of inconsistent caching. Imagine that we build the documentation for a crate with one version of rustdoc, that loads /main-<rustdoc version 1>.js and /<crate>/<version>/search-index.js. If we then rebuild those docs with a later nightly that will start loading /main-<rustdoc version 2>.js and /<crate>/<version>/search-index.js. If someone had a cached copy of the original search-index.js then they would be loading that with a newer version of main.js, which could fail because the format has changed.

(I think the original issue was with semver-constraint-paths rather than rebuilt documentation, but I think that's no longer an issue because we never serve any files under semver-constraint-paths, just redirect to the current matching version).

@jyn514
Copy link
Member Author

jyn514 commented Mar 21, 2021

If someone had a cached copy of the original search-index.js then they would be loading that with a newer version of main.js, which could fail because the format has changed.

Hmm, when would search-index.js be cached but not main.js? Aren't they all loaded at the same time?

jyn514 added a commit to jyn514/rust that referenced this pull request Mar 21, 2021
@Nemo157
Copy link
Member

Nemo157 commented Mar 21, 2021

If we didn't have resource suffixes on search-index.js the load dependency tree would be:

/<crate>/<version>/index.html
├── /main-<suffix>.js
└── /<crate>/<version>/search-index.js

and after a rebuild:

/<crate>/<version>/index.html
├── /main-<suffix 2>.js
└── /<crate>/<version>/search-index.js

The browser when loading /<crate>/<version>/index.html would (almost) always do a conditional request with the ETag to check that the cached copy is still valid, the condition would fail and it would be sent the new version of the html which referenced the new path for the main.js and the same path for search-index.js. Since the path for main.js changed it would then have to do a new request for that path, while search-index.js appears to be the same file and it could simply use the cached copy (assuming that we have cache-directives on it).

@jyn514
Copy link
Member Author

jyn514 commented Mar 21, 2021

I think another possibility rather than messing with this in rustdoc is to invalidate the CloudFront cache whenever we rebuild a crate - that seems like good practice anyway since otherwise you'll see older versions of docs, and we rebuild infrequently enough it shouldn't be too expensive.

@Nemo157
Copy link
Member

Nemo157 commented Mar 21, 2021

Invalidating the CDN wouldn't be enough, when navigating to a doc page I see the search-index.js served directly from my browser cache without any conditional request.

One idea would be to have a different suffix that we could tell rustdoc to stick on dynamic subresources, we could then fill that suffix with something that is per-build so that the search-index.js for each build would actually be different paths and cached separately (that would also mean we could mark the subresources as truly immutable for caching purposes). (Wouldn't fix issues with caching of doc-pages as a whole, but would ensure each page is self-consistent).

@jyn514
Copy link
Member Author

jyn514 commented Mar 21, 2021

I mean, at that point I don't know if it's worth the effort - I would rather just store the extra 3 files per toolchain, it's trivial compared to our other costs (or we could special-case those three if @pietroalbini thinks the savings are worth it).

The immediate problem should be solved by #1324 - even if we do copy files that shouldn't be shared, rustdoc will still load them per-crate, which should avoid any errors.

@jyn514
Copy link
Member Author

jyn514 commented Mar 21, 2021

So I don't forget: this is waiting on #1312 (comment).

@jyn514
Copy link
Member Author

jyn514 commented Mar 21, 2021

This also seems like it should be updated to take into account the known shared static files, instead of being a hardcoded regex (e.g. it's copying the .woff2 fonts, and the .png favicons).

Done - I don't know a great way to test it, but it doesn't have any fewer tests than the rest of docbuilder. Notably, this also fixes #294.

This avoids having to constantly update docs.rs when new files are
added.
This avoids unnecessary upload costs for S3.
@jyn514 jyn514 merged commit f552d45 into rust-lang:master Mar 21, 2021
@jyn514 jyn514 deleted the rustdoc-files branch March 21, 2021 20:17
jyn514 added a commit to jyn514/rust that referenced this pull request Mar 26, 2021
The intended use case is for docs.rs, which can now copy exactly the
files it cares about, rather than having to guess based on whether they
have a resource suffix or not. In particular, some files have a resource
suffix but cannot be shared between crates: rust-lang/docs.rs#1312 (comment)

The end goal is to fix rust-lang/docs.rs#1327
by reverting rust-lang/docs.rs#1324.

This obsoletes `--print=unversioned-files`, which I plan to remove as
soon as docs.rs stops using it.
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this pull request Apr 2, 2021
…imulacrum

rustdoc: Add unstable option to only emit shared/crate-specific files

The intended use case is for docs.rs, which can now copy exactly the
files it cares about, rather than having to guess based on whether they
have a resource suffix or not. In particular, some files have a resource
suffix but cannot be shared between crates: rust-lang/docs.rs#1312 (comment)

The end goal is to fix rust-lang/docs.rs#1327 by reverting rust-lang/docs.rs#1324.

This obsoletes `--print=unversioned-files`, which I plan to remove as
soon as docs.rs stops using it.

I recommend reviewing this one commit at a time.

r? `@GuillaumeGomez` cc `@Nemo157` `@pietroalbini`
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this pull request Apr 2, 2021
…imulacrum

rustdoc: Add unstable option to only emit shared/crate-specific files

The intended use case is for docs.rs, which can now copy exactly the
files it cares about, rather than having to guess based on whether they
have a resource suffix or not. In particular, some files have a resource
suffix but cannot be shared between crates: rust-lang/docs.rs#1312 (comment)

The end goal is to fix rust-lang/docs.rs#1327 by reverting rust-lang/docs.rs#1324.

This obsoletes `--print=unversioned-files`, which I plan to remove as
soon as docs.rs stops using it.

I recommend reviewing this one commit at a time.

r? ``@GuillaumeGomez`` cc ``@Nemo157`` ``@pietroalbini``
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-builds Area: Building the documentation for a crate S-waiting-on-review Status: This pull request has been implemented and needs to be reviewed
Projects
None yet
4 participants