Document comment representation is inefficient #65714

nnethercote · 2019-10-23T04:06:13Z

Imagine a doc comment like this:

/// aaa
/// `bbb`
/// ccc

This gets treated like this:

#[doc=" aaa"]
#[doc=" `bbb`"]
#[doc=" ccc"]

And each line gets represented with a TokenStream containing two elements: a Eq token and a Lit token. These end up in metadata, and the decoding cost is quite high. Because doc comments can be long, spanning many lines, it would be nice to join them together -- much like rustdoc's collapse-span pass does, but earlier -- to reduce the metadata decoding cost.

I wrote an experimental, not-quite-right patch that condensed to the following in the parser:

#[doc="/// aaa\n/// `bbb`\n/// ccc"]

It failed lots of rustdoc tests, but the perf results were great:

ssue-46449-check
        avg: -16.1%     min: -16.1%     max: -16.1%
helloworld-check
        avg: -12.6%     min: -12.6%     max: -12.6%
unify-linearly-check
        avg: -10.9%     min: -10.9%     max: -10.9%
deeply-nested-check
        avg: -8.7%      min: -8.7%      max: -8.7%
await-call-tree-check
        avg: -8.1%      min: -8.1%      max: -8.1%
regression-31157-check
        avg: -4.0%      min: -4.0%      max: -4.0%
ripgrep-check
        avg: -2.7%      min: -2.7%      max: -2.7%
encoding-check
        avg: -2.4%      min: -2.4%      max: -2.4%
regex-check
        avg: -2.0%      min: -2.0%      max: -2.0%
syn-check
        avg: -1.6%      min: -1.6%      max: -1.6%
futures-check
        avg: -1.5%      min: -1.5%      max: -1.5%
clap-rs-check
        avg: -0.9%      min: -0.9%      max: -0.9%
piston-image-check
        avg: -0.9%      min: -0.9%      max: -0.9%
coercions-check
        avg: -0.8%?     min: -0.8%?     max: -0.8%?
webrender-check
        avg: -0.6%      min: -0.6%      max: -0.6%
token-stream-stress-check
        avg: -0.6%      min: -0.6%      max: -0.6%
html5ever-check
        avg: -0.6%      min: -0.6%      max: -0.6%
inflate-check
        avg: -0.4%      min: -0.4%      max: -0.4%
cargo-check
        avg: -0.4%      min: -0.4%      max: -0.4%
cranelift-codegen-check
        avg: -0.4%      min: -0.4%      max: -0.4%
serde-check
        avg: -0.3%      min: -0.3%      max: -0.3%
wg-grammar-check
        avg: -0.3%      min: -0.3%      max: -0.3%
deep-vector-check
        avg: -0.2%      min: -0.2%      max: -0.2%

I then modified it to be more correct, like this:

#[doc="/// aaa\n `bbb`\n ccc"]

But then I get build failures -- the doc string appears to be tokenized somewhere, and it complains about the backticks not being valid tokens ("error: unknown start of token: `") due to the lack of /// after the first newline. Anyone know where that tokenization might arise from?

So, there is definitely room for improvement in the representation of doc comments, but some care will be needed to keep things working. I'd love to hear suggestions on the right way to do this.

cc @rust-lang/wg-compiler-performance

The text was updated successfully, but these errors were encountered:

nnethercote · 2019-10-23T04:19:17Z

Hmm, it's slightly different to how I described it above. In the current code, the /// at the start of each line is actually preserved with the doc attribute, and is only stripped when it is converted into a DocFragment::SugaredDoc by rustdoc::clean::Attributes::from_ast().

So perhaps I can join the lines like I did, keeping the /// markers in there, and then change Attributes::from_ast() to strip /// markers from the middle of the string as well as the beginning.

nnethercote · 2019-10-23T04:33:50Z

I guess a relevant question is whether doc comments need to be put into metadata?

petrochenkov · 2019-10-23T05:06:30Z

Duplicate of #60936 and #60935.
cc #60930

nnethercote closed this as completed Oct 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document comment representation is inefficient #65714

Document comment representation is inefficient #65714

nnethercote commented Oct 23, 2019 •

edited

Loading

nnethercote commented Oct 23, 2019

nnethercote commented Oct 23, 2019

petrochenkov commented Oct 23, 2019

Document comment representation is inefficient #65714

Document comment representation is inefficient #65714

Comments

nnethercote commented Oct 23, 2019 • edited Loading

nnethercote commented Oct 23, 2019

nnethercote commented Oct 23, 2019

petrochenkov commented Oct 23, 2019

nnethercote commented Oct 23, 2019 •

edited

Loading