-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bundle local images in rustdoc output #3397
base: master
Are you sure you want to change the base?
Bundle local images in rustdoc output #3397
Conversation
|
||
Since the local images are all put in the same folder, if the same is imported from different crates, the content won't be duplicated since they have the same name and the same hash. | ||
|
||
If the path isn't referring to a file, a warning will be emitted and rustdoc will left the path unchanged in the generated documentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this result in relative links to rustdoc pages that aren't using intra-doc links warning? Warning for these is fine, but should ideally be able to say "this looks like an intra-doc link, use one instead."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I don't understand what you mean. Intra-doc links is a feature used on links whereas this feature is only applied on image links. For example:
/// ![image](../relative-link-to-image)
///
/// [intra_doc_link_type]
Additional prior art: the library Unrelated to the above, it would be neat if there was a way to embed images that were generated by the crate's own code. For crates where such generation is at all within their scope, this would both allow for “guaranteed fresh” sample output and charts (much like doc-tests guarantee that examples work), and remove the need for documentation images to be included in the I imagine that this is implausible for now because it would require Cargo and rustdoc to cooperate on a multi-step process that makes |
@kpreid: Do you mean images that are generated by the
If If it's not possible for any reason (please explain why), are you suggesting to add an option to rustdoc which would allow to copy local files to the rustdoc output folder? |
No, I mean images generated by code that is dependent on the code in the library or binary crate being documented (in the same way that integration tests are dependent on said code) — so it would be well after the build script stage. I doubt this feature I am describing should be part of this RFC; I am just mentioning it as a future possibility that would be powerful enough that, if there is any consideration here that might make that more or less feasible, perhaps it should be considered now. |
Sure. I'm still not sure to fully understand what you explained but I don't mind adding this as part of a future extension. How would you word it? |
I don't have any specific changes in mind — I just wanted to raise the idea to get people thinking about it for the future. On re-reading, the only specific thing I would suggest in this RFC is to make it clear that given
Also, I think the first part of my original comment may have been missed; I intended to suggest that the prior art section should discuss the library embed-doc-image which currently accomplishes the same goal, by means of embedding the images in data: URLs in the doc markdown, so that they need not be picked up by rustdoc itself. The disadvantages of that library which this RFC would improve on are that all image data has to pass through rustc and a proc macro (inefficiency), and |
I don't get this part: it doesn't understand the concept of rust item, only URLs and local paths (either relative or absolute). If the path provided isn't found, it will warn and keep it as is.
It is written explicitly in the RFC:
I'll add it then. |
The Motivation section discusses things other than images: specifically, scripts. But then the actual proposed new feature only works for images and has a lot of image-specific special casing. Can the RFC text please discuss how to handle scripts, or other resources that aren't images? There's some discussion of this in "Possible Extensions" but I think that approach is unlikely to happen soon. IMO ideally there would be a way to tell rustdoc to "just copy this file into a known place in the output, preserving its leafname". |
It's mentioned in the motivations section because it's something that we could want into the future. However, the approach for handling non-image files would be very different since there is no standard markdown syntax for it, which would force us to first define this syntax. So instead of going all at once in here, I think it's much better to first land this so we can have the code ready in case we want to add support for other files than images. But I prefer to warn you beforehand: including scripts comes with security concerns and that will be very complicated to go forward with it. I think it'll very likely never be approved. Anyway, discussion for another day. :) |
There weren't many comments so I guess we can start the last part of the process. @rfcbot merge |
Team member @GuillaumeGomez has proposed to merge this. The next step is review by the rest of the tagged team members: Concerns: Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up! See this document for info about what commands tagged team members can give me. |
Thanks for working on this. I really support the goals - consistent reliable images and scripts across local, self-hosted, and docs.rs. But I'm dissatisfied with the implementation plan for docs.rs: embedding images in the .crate file uploaded to crates.io. I think it unnecessarily bloats .crate files, which wastes bandwidth for crates.io, for end-users, and for CI platforms. Docs.rs downloads each crate once; the various other users of crates download them millions of times. We shouldn't include bytes that are only useful for documentation. Can we find a different solution? When we first started talking about this issue I was not worried about crate size because I figured this would be mostly used for project icons, and they would be ~5kB of SVG. But thinking about it more I realize some crates will want larger images to illustrate complex concepts. Most doc authors are not experts in minimizing image files, and rustdoc won't have any way to automatically minimize them. And I'd really like to have a rigorous way to use scripts, too; and useful scripts like KaTeX are in the ~200kB range. IIRC crates.io's limit on crate size is 50MB. In some sense that's good since it means there's plenty of room for doc resources. But on the other hand it's bad since it would allow making very big crate files filled up with doc resources. Of course it is currently allowed to make very big crates anyhow, but this feature would create more incentive to do so. @rfcbot concern Crate size The main constraints that are challenging:
For the first problem, docs.rs would have to introduce another input. We could make that input predictable, and pre-fetch it to the filesystem before running the build rather than having the doc build process fetch it. I don't love having another input in docs.rs but I think it would be better than making packages bigger for all non-doc uses. We could reduce the operational problems by only fetching that alternate input from one specific host - like crates.io, or another host created for the purpose. For the second problem, there would have to be a well-known place to fetch additional resources for a given .crate file. For user experience, we would want to avoid doc authors needing to create separate credentials for uploading doc resources. That argues in favor of having a way to upload additional files to crates.io, specifically linked to a crate version and using the same credentials as the crate itself. I know this makes for a much more complex project. I'd love to find a better way but am having trouble thinking of one! |
Very interesting take! I think we'll need to have opinions from @rust-lang/crates-io and eventually @rust-lang/cargo to move forward on this. I think having two different kind of resources for |
|
||
A new rustdoc pass will be added which would go through all documentation to gather local resources into a map. | ||
|
||
Then in HTML documentation generation, the local resources pathes will be replaced by their equivalent linking to the output directory instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then in HTML documentation generation, the local resources pathes will be replaced by their equivalent linking to the output directory instead. | |
Then in HTML documentation generation, the local resources paths will be replaced by their equivalent linking to the output directory instead. |
* reduces the number of duplicated images that `docs.rs` has to store | ||
* doesn't require the source code for the source crate when inlining | ||
* only requires storing the hash of the file in the `.rmeta`, not the whole image | ||
* requires rustc to look at the doc comments and hash the image(s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While doing this, can it also produce a lint warning if the image is bigger than 50KiB or so?
Of course, any value chosen here will be arbitrary and contentious, but so does adding operational complexity to crates.io, docs.rs, and third parties that want to run rustdoc builds in sandboxes (how would hosting images separately fit into Bazel?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps; or perhaps the lint should be something like "the total of all your non-code resources is >10% of the crate size." It's still not really satisfying though, since it doesn't fully solve the problem. If you want to include lots of resources, or large resources, the only solution is going to be to refer to them using absolute URLs.
Out of curiosity I spot-checked a couple of crates, regex and rayon. They're 248 kB and 169 kB respectively, so 10% would be 24.8 kB and 16.9 kB.
Also it's worth mentioning one counterpoint to my concern: source files that are bundled into .crate files are not minified and comments aren't removed. That's important specifically so docs can be generated from them (including the source file view within rustdoc). So in some sense published crates already do contain bytes that are only useful for documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in some sense published crates already do contain bytes that are only useful for documentation.
Examples and tests are also arguably superfluous files — they benefit crater, and those who read the source code of the downloaded crate, but not the typical downloader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing it with percentages seems unnecessarily complicated for something that’s just an arbitrary metric pulled out of a hat anyway.
- as long as you’re using ordinary markdown, you won’t be able to use responsive images, so mobile viewers on metered data plans reading docs.rs will be downloading full-sized images
- docs.rs isn’t a very good image CDN, since it doesn’t reencode to WebP or anything
- anyone building docs locally will be downloading the image even if they don’t look at it
In other words, it’s not just about cargo build. Coping with large images is just a chore.
What I’m trying to do is gesture in the direction of “try to use small images.” Not just solve the problem for plain cargo builds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Random musings on cargo's behavior
To protect against zip bombs, cargo had a 512 MB limit when extracting crates and people ran into this (rust-lang/cargo#11151) and we updated it to take compression size into account (rust-lang/cargo#11337) so we are less likely to hit the 512 MB limit unintentionally.
We also now report crate size on package/publish (rust-lang/cargo#11270). One future possibility mentioned in that PR is to warn when binary files are included (rust-lang/cargo#9058). In general, cargo has been hesitant about adding warnings that because we haven't had a way for people to disable them but with [lints]
(rust-lang/cargo#12115), we have the possibility to add [lints.cargo]
support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be cognizant of users. By including files that aren't as necessary
- Slower download of
.crate
file - Extra disk space taken up (currently will be twice, once for the
.crate
and once for the extracted directory) - Extra time to extract the
.crate
Some things that would help (and are currently being discussed)
- Garbage collection of
.crate
files and extracted.crate
files - Building directly from
.crate
files (this is a big more out there)
Dependent CI jobs will never use this content.
I wonder how many users are like me that almost exclusively use docs.rs rather than cargo doc
and any missing images wouldn't be a big deal. Or in other words, we should consider if the benefit to the community outweighs the cost to the community.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just answering:
I wonder how many users are like me that almost exclusively use docs.rs rather than
cargo doc
and any missing images wouldn't be a big deal. Or in other words, we should consider if the benefit to the community outweighs the cost to the community.
It'd be needed for docs.rs too unfortunately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be needed for docs.rs too unfortunately.
How so? I'm not seeing anything listed in the Motivation that benefits us docs.rs-exclusive users (speaking as someone who also has crates that include images).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless you host the images on a website, you can't actually use them in the documentation generated by docs.rs. There are some ways to go around this limitation, for example: generating the base64 equivalent of the image and including it directly into your crate doc comment. But it's not great. Hence this RFC. But the problem is as you mentioned that it would impact the community by downloading more content that wouldn't actually be used.
A potential solution you mentioned (Building directly from .crate files (this is a big more out there)
) would very likely lessen the negative impact. But better take things slow and be sure we don't miss anything. :)
Yeah I agree with @jsha's concern Might be worth postponing the RFC, and the interested parties can go talk to the other teams until they have a new proposal (because I don't see this proposal being able to fit that concern without effectively being changed into something rather different) |
|
||
This would be done by allowing users to specify the path of a local resource file in doc comments. The resource file would be stored in the `doc.files` folder. The `doc.files` folder will be at the "top level" of the rustdoc output level (at the same level as the `static.files` or the `src` folders). | ||
|
||
The only local resources considered will be the ones in the markdown image syntax: `![resource title](path)`, where `<path>` is the path of the resource file relative to the source file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does this solve the <script>
usecase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't cover this case at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm... then I guess it might be better not to mention it in the "Motivation" section, or explicitly mention that this RFC does not cover this case :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On this line it mentions The only local resources considered will be the ones in the markdown image syntax
. But it's true that a bit above we have They would also like to include images (like logos and diagrams), and scripts
, which is confusing. I'll remove this part. But otherwise, where is it not clear enough so I can make it more obvious?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe something like: "While using custom <script>
tags is a valid usecase, this RFC only focusses on the usecase of dealing with images" at the end of the motivation section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
||
The local resources files are not affected by the `--resource-suffix`. | ||
|
||
The impact on `docs.rs` would also be very minimal as the size of a published crate resources is limited to a few megabytes. The only thing needed would be to handle the new `doc.files` folder. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for images this might be true, but I'm a bit worried about JS files. while docs.rs currently allows running arbitrary JS from 3rd-party servers, the situation might become even worse if docs.rs itself would serve those potentially malicious files.
or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
![]
syntax generates <img>
HTML tag and nothing else. So the idea behind this RFC is to bundle the resource (if local) into the generated rustdoc output.
If you put a .js
file in it, it'll be bundled inside a <img>
(so not displayed) and I suppose you'll then be able to call it with <script>
though. But the same can be done currently with any file extension and then load it as JS into a <script>
. So in this regard, there are no changes I suppose?
I generally support the goals of this RFC, but I agree with the previous commenters that we would need some way to discourage including huge resources in the crate tarballs. If the images end up being significantly larger than the source files then it's questionable whether it is worth including them. Regarding the "having two different kind of resources" idea: I would be strongly against that. It seems like a lot of complexity for a comparatively small benefit. |
I talked with @epage and he mentioned that in the future, |
# Objective This PR continues #8885 It aims to improve the `Mesh` documentation in the following ways: - Put everything at the "top level" instead of the "impl". - Explain better what is a Mesh, how it can be created, and that it can be edited. - Explain it can be used with a `Material`, and mention `StandardMaterial`, `PbrBundle`, `ColorMaterial`, and `ColorMesh2dBundle` since those cover most cases - Mention the glTF/Bevy vocabulary discrepancy for "Mesh" - Add an image for the example - Various nitpicky modifications ## Note - The image I added is 90.3ko which I think is small enough? - Since rustdoc doesn't allow cross-reference not in dependencies of a subcrate [yet](rust-lang/rust#74481), I have a lot of backtick references that are not links :( - Since rustdoc doesn't allow linking to code in the crate (?) I put link to github directly. - Since rustdoc doesn't allow embed images in doc [yet](rust-lang/rust#32104), maybe [soon](rust-lang/rfcs#3397), I had to put only a link to the image. I don't think it's worth adding [embed_doc_image](https://docs.rs/embed-doc-image/latest/embed_doc_image/) as a dependency for this.
# Objective This PR continues #8885 It aims to improve the `Mesh` documentation in the following ways: - Put everything at the "top level" instead of the "impl". - Explain better what is a Mesh, how it can be created, and that it can be edited. - Explain it can be used with a `Material`, and mention `StandardMaterial`, `PbrBundle`, `ColorMaterial`, and `ColorMesh2dBundle` since those cover most cases - Mention the glTF/Bevy vocabulary discrepancy for "Mesh" - Add an image for the example - Various nitpicky modifications ## Note - The image I added is 90.3ko which I think is small enough? - Since rustdoc doesn't allow cross-reference not in dependencies of a subcrate [yet](rust-lang/rust#74481), I have a lot of backtick references that are not links :( - Since rustdoc doesn't allow linking to code in the crate (?) I put link to github directly. - Since rustdoc doesn't allow embed images in doc [yet](rust-lang/rust#32104), maybe [soon](rust-lang/rfcs#3397), I had to put only a link to the image. I don't think it's worth adding [embed_doc_image](https://docs.rs/embed-doc-image/latest/embed_doc_image/) as a dependency for this.
For context, Bevy would really love to have this for our docs to demonstrate various rendering and mathematical concepts. That however means shipping dozens to hundreds of shiny full resolution images (scaling them down will mean misleading artifacts, and SVGs simply don't apply). We would hit the 50 MB limit rather quickly if we were using this feature as aggressively as I would like from a purely docs/learning perspective. I definitely think we should avoid shipping all of the images to every user of the crate somehow. |
An "interesting idea" that crates could apply could be to use a CDN front around GitHub raw file access (e.g. Statically, no association) and release version tags to get the correct revision of the asset, e.g. |
Another idea would be to make image downloads opt in: |
It should be possible to ship images as a separate optional dependency with this RFC, assuming rustdoc tracks spans for relative path references # bevy/Cargo.toml
[features]
example-images = ["dep:bevy-example-images"]
[dependencies]
bevy-example-images = { optional = true, ... } // bevy/lib.rs
/// Some feature
#[cfg_attr(feature = "example-images", doc = bevy_example_images::some_feature!())]
pub fn some_feature() {} // bevy-example-images/lib.rs
#[macro_export]
macro_rules! some_feature {
() => ("![showcase of some feature](./some_feature.png)")
} (EDIT: Though, this relies on dependency source-code access which the RFC tries to avoid for re-export inlining.) |
# Objective This PR continues bevyengine/bevy#8885 It aims to improve the `Mesh` documentation in the following ways: - Put everything at the "top level" instead of the "impl". - Explain better what is a Mesh, how it can be created, and that it can be edited. - Explain it can be used with a `Material`, and mention `StandardMaterial`, `PbrBundle`, `ColorMaterial`, and `ColorMesh2dBundle` since those cover most cases - Mention the glTF/Bevy vocabulary discrepancy for "Mesh" - Add an image for the example - Various nitpicky modifications ## Note - The image I added is 90.3ko which I think is small enough? - Since rustdoc doesn't allow cross-reference not in dependencies of a subcrate [yet](rust-lang/rust#74481), I have a lot of backtick references that are not links :( - Since rustdoc doesn't allow linking to code in the crate (?) I put link to github directly. - Since rustdoc doesn't allow embed images in doc [yet](rust-lang/rust#32104), maybe [soon](rust-lang/rfcs#3397), I had to put only a link to the image. I don't think it's worth adding [embed_doc_image](https://docs.rs/embed-doc-image/latest/embed_doc_image/) as a dependency for this.
Why is I'd like to see a Cargo.toml metadata way of including bundled assets in addition to parsing the MD document. It doesn't add that much complexity and can be used down the road for JS and CSS as jsha mentions (KaTeX, common stuff in large docs, etc.). It's currently possible to include these things statically with |
Tracking issue
cc @rust-lang/rustdoc
Rendered