Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Host each crate on its own subdomain and allow user JS #1853

Open
jsha opened this issue Sep 20, 2022 · 6 comments
Open

Host each crate on its own subdomain and allow user JS #1853

jsha opened this issue Sep 20, 2022 · 6 comments
Labels
E-hard Effort: This will require a lot of work S-needs-design Status: There's a problem here, but no obvious solution; or the solution raises other questions

Comments

@jsha
Copy link
Contributor

jsha commented Sep 20, 2022

In #167 there is some discussion of what to do about crates providing their own JS. There are two risks suggested: cryptocurrency mining via JS and installing a ServiceWorker that could serve incorrect documentation for other crates. As far as I know the current plan of record is to prevent this using the Content-Security-Policy header to allowlist certain JS, and initial steps were taken in #1333, which implements CSP for crate pages only, with rustdoc pages left as a future exercise.

I propose that we abandon as impractical the plan to implement CSP for rustdoc pages. Instead, we should explicitly allow bring-your-own-JS, and we should plan on a separate subdomain per crate. This aligns docs.rs security boundaries (crates' documentation should not be able to affect each other) with the web's natural security boundaries. Specifically, the Same-origin Policy is the foundation of web security and states that scripts on foo.example.com cannot affect bar.example.com (without specific opt-in from bar.example.com, other caveats apply, etc etc).

Allowing KaTeX and other useful libraries

With each crate on its own subdomain, we can unreservedly allow crates to include whatever JS they want. This resolves a long-standing uncertainty about what is/will be allowed on docs.rs, particularly as regards the popular KaTeX library used to render LaTeX inline on web pages.

Aligning docs.rs with rustdoc

Allowing crates to bring their own JS (and styles, and even fonts) aligns docs.rs with rustdoc's philosophy: rustdoc has a variety of flags that allow adding arbitrary HTML (including script tags). Also, rustdoc implements Markdown, which is defined to allow arbitrary HTML. Since docs.rs relies so heavily on rustdoc, it would be challenging to enforce a security boundary that rustdoc does not participate in enforcing. To further underscore that: rustdoc has no systematic XSS defense in its HTML generation.

Also, since docs.rs hosts all historic versions of a crate as they were documented at the time, docs.rs needs to deal with output from many historical versions of rustdoc. So even to the extent rustdoc is updated to participate in enforcing this security boundary, we would face the problem of what to do with old versions, and what to do with modern versions that were emitted by a buggy version of rustdoc.

Crates control their own execution environment

With build.rs, crates can do a lot to modify their environment at build time. For instance the xss-probe build.rs takes the simple expedient of writing a .html and a .js file into the docs/ directory before rustdoc runs. Even if we blocked that behavior (for instance, by clearing the doc directory at some strategic moment), there are potential tricks: overwriting the rustdoc binary, setting PATH or LD_LIBRARY_PATH, or other unknown shenanigans. To make a defensible security boundary of "thou shalt not write unauthorized files during doc builds," we would have to invent and enforce a lot of other security boundaries that are not even currently considered boundaries in the Rust ecosystem.

Script-nonce won't work for rustdoc output

For templated output from the docs.rs web server, we can use script-nonce, and inject the nonce at the known places where we are generating an inline script or a <script src=...> tag. But we can't inject nonces into rustdoc HTML because we don't know the known-good places. We could parse the HTML and inject the nonce on all script tags, but of course that would defeat the purpose since we would also inject the nonce on malicious script tags.

Allowlisting scripts also won't work for rustdoc output

We could allowlist the shared files (mainXXX.js, storageXXX.js), but crate-specific JS is a problem. As one example, each crate has a source-filesXXX.js that lists all the files for the source view sidebar (e.g. https://docs.rs/ureq/latest/source-files-20220709-1.64.0-nightly-6dba4ed21.js). That file is under control of the crate author (see "Crates control their own execution environment" above). So allowlisting it would pierce the security boundary we are trying to defend.

DNS and TLS wildcards

Having a separate subdomain for each crate does not require that we configure separate DNS and certificates for 75k+ crates. Instead, we should set up a wildcard DNS entry (*.docs.rs) that points all subdomains to the same set of IP addresses. And we can get a wildcard certificate to match. Then routing requests in docs.rs would just require looking at the hostname as well as the path.

We could also continue doing nothing for a while

In general it's always a good idea to compartmentalize different users' content from each other. For instance, GitHub Pages uses *.github.io, readthedocs uses *.readthedocs.io. However, since there is no authentication on docs.rs and no cookies, the issues we're facing are not particularly serious and we can continue to postpone a systematic fix.

We can disable ServiceWorkers via the CSP worker-src directive, without blocking scripts in general.

Cryptocurrency mining via JS is annoying, but has such tiny yields you need a massive amount of visitor traffic to be worthwhile. I don't know what the current state of the problem is, but I suspect you would need to either distribute your JS via an ad network or via a large number of compromised websites to make it worthwhile. And it's pretty noisy. If someone starts using the documentation of a popular crate to mine cryptocurrency, it would be spotted quickly and the docs.rs team could take it down and take any necessary followup actions. This seems like a purely hypothetical problem at this point.

Even if we don't decide to move forward with per-crate subdomains, I think it's very worthwhile to make the decision now that crates are allowed to embed JavaScript, and they will continue to be allowed to do so. The status quo creates unnecessary uncertainty for crate authors, and stumbling blocks for docs.rs developers.

Why the existing approach causes problems

Some issue threads where CSP came up as causing trouble (presumably the combination of default-src 'none'; and `script-src 'nonce-XYZabc123'):

#1387
#302
#1552
#1255
#568

One last cute thing

If each crate has its own subdomain, each crate can have its own favicon logo, so you can better identify different crates' docs in your tabs! ❤️

@Nemo157
Copy link
Member

Nemo157 commented Sep 20, 2022

Script-nonce won't work for rustdoc output
Allowlisting scripts also won't work for rustdoc output

These two at least are solveable. Instead of using a nonce we would use precalculated hashes from the essential files; and instead of having JS files we can have rustdoc emit JSON files for things like sidebar-items that get loaded by the pre-validated essential files (opt-in to still allow file: hosting which IIRC can't fetch JSON files).

@Nemo157
Copy link
Member

Nemo157 commented Sep 20, 2022

If each crate has its own subdomain, each crate can have its own favicon logo, so you can better identify different crates' docs in your tabs! ❤️

Oh yeah, this is technically possible currently, e.g. https://docs.rs/tide/latest/tide/ (though, blocked by uBlock for me because third-party favicons are a privacy issue).

I would like if we can block having third-party urls for these (and the sidebar logo) and instead have rustdoc able to embed a provided file into the output bundle that is served from docs.rs.

@jsha
Copy link
Contributor Author

jsha commented Sep 20, 2022

Script-nonce won't work for rustdoc output
Allowlisting scripts also won't work for rustdoc output

These two at least are solveable.

This is true, although it requires a bunch of little things:

  • implementation in rustdoc to generate both .json and .js versions of these files
  • implementation in rustdoc to load the appropriate type of file based on current URL scheme
  • test harness changes for rustdoc to run a local web server for testing the .json variant
  • ongoing maintenance for two paths
  • an API commitment from rustdoc that on http: and https: URLs, no JS is ever loaded other than from the files from --emit=unversioned,toolchain-specific
  • an API commitment from rustdoc never to use inline script

Additionally on the docs.rs side it would require keeping track of the mapping from each release to its list of essential files. And it would break the search and sidebar functionality for all historic docs.

None of it is insurmountable, but it winds up being a bunch of work chasing fixes for problems introduced by fixes for problems, etc. :-) And meanwhile there's a nice elegant solution that is already proven out by other hosting sites and aligns with the web security model.

favicon logo

Oh yeah, this is technically possible currently

Oh right, I forgot about this! Tide is using:

#![doc(html_favicon_url = "https://yoshuawuyts.com/assets/http-rs/favicon.ico")
#![doc(html_logo_url = "https://yoshuawuyts.com/assets/http-rs/logo-rounded.png")]

Presumably we could modify rustdoc so it would fetch these URLs and generate HTML that points to a local image file. We could also try Content-Security-Policy: img-src 'self' but it would break old docs.

@Mark-Simulacrum
Copy link
Member

I don't think rustdoc fetching URLs is a good model, since it means you need to give unbounded network access to the crate build - which brings a lot of pain, e.g., you need to be much more careful about things like the ec2 metadata service / instance roles, and in general the network environment of the builder.

@jsha
Copy link
Contributor Author

jsha commented Sep 20, 2022 via email

@jsha
Copy link
Contributor Author

jsha commented Oct 4, 2022

By the way, the potential for crate A installing ServiceWorkers to mess with crate B's docs is, I think, not present. According to https://w3c.github.io/ServiceWorker/#service-worker-script-response, a script at https://example.com/foo/sw.js can only install ServiceWorkers that will handle requests at https://example.com/foo/ or below.

@syphar syphar added S-needs-design Status: There's a problem here, but no obvious solution; or the solution raises other questions E-hard Effort: This will require a lot of work labels Aug 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
E-hard Effort: This will require a lot of work S-needs-design Status: There's a problem here, but no obvious solution; or the solution raises other questions
Projects
None yet
Development

No branches or pull requests

4 participants