Coalescing for line doc comments #60936

petrochenkov · 2019-05-18T11:44:02Z

Right now multiple line doc comments are considered separate tokens and separate attributes:

/// A
/// B
/// C
pub struct S;

is interpreted as

#[doc = "/// A"]
#[doc = "/// B"]
#[doc = "/// C"]
pub struct S;

To mitigate the perf impact from turning normal comments into doc comments (#60930) we may want to merge multi-line line comments into a single token and a single attribute:

#[doc = "/// A\n/// B\n/// C"]
pub struct S;

nagisa · 2019-05-19T19:55:30Z

is interpreted as

#[doc = "/// A"]
#[doc = "/// B"]
#[doc = "/// C"]
pub struct S;

It cannot be interpreted that way because /// are not stripped by rustdoc. It probably is interpreted as

#[doc = "A"]
#[doc = "B"]
#[doc = "C"]
pub struct S;

so we don’t need the /// in the concatenated string either.

petrochenkov · 2019-05-19T20:58:19Z

I've just checked, /// is there both in tokens and in AST.
It looks like they are occasionally (sigh) stripped, e.g. when doc comments are passed to macros.
Rustdoc strips them as well (see with_desugared_doc).

In other words, the details need to be figured out.
Ideally, tokens should be passed around losslessly/precisely without attempts to be "helpful".
Not sure how close to that we can get backward-compatibly.

petrochenkov · 2019-05-19T21:01:03Z

Another incompatibility is with code doing something like this

// Passed to a macro
/// Comment 1
/// Comment 2

// Macro expects exactly two attributes
#[$meta:meta]
#[$meta:meta]

(Again, we'll need crater to figure out whether we need to provide a compatibility layer or not.)

nagisa · 2019-05-19T21:11:17Z

I've just checked, /// is there both in tokens and in AST.

Interesting. I tried to run cargo doc on your desugared version and the markdown processor barfed at all the /// in it.

petrochenkov · 2019-05-20T17:27:39Z

A bit on representing tokens produced by lexer losslessly (TLDR: I think literals do it right).

Tokens need to contain their kind and data - { kind: StrLiteral, data: `"foo"` } (quotes in the data).
The kind needs to be readily available because parser uses it all the time.
So it makes sense to remove the kind bits from the data, so there's no duplication and they cannot get inconsistent by definition - { kind: StrLiteral, data: `foo` } (no quotes in the data).
The original token still can be losslessly recovered from this representation - { kind: StrLiteral, data: `foo` } => `"foo"` .
(This is what literals in the compiler currently do.)

The same logic is applicable to doc comments, which are basically raw string literals with unusual quotes (/** + */ and /// + \n).

/** line 1
    line 2
    line 3 */

/// line

=>

{
    kind: OuterBlockDocComment,
    data: ` line 1
    line 2
    line 3 `
}
{
    kind: OuterLineDocComment,
    data: ` line`
}

(the original token still can be losslessly recovered).
(This is not what literals in the compiler currently do, but they probably should.)

If you look at line doc comments like that, it's kind of clear why each line is a separate token.

Keeping them separate at token level, but then concatenating later kind of defeats the purpose of optimization (it may be more work, and more interned strings).
So, the optimization requires redefining what the "closing quote" of a line doc comment is (start of the next non-line-doc-comment token?).

    /// line 1
    /// line 2
    /// line 3
    struct S;

=>

{
    kind: OuterLineDocComment,
    data: ` line 1
    /// line 2
    /// line 3
    `
}

(the original token still can be losslessly recovered).

Pretty weird.

nnethercote · 2019-10-23T08:54:27Z

So, the optimization requires redefining what the "closing quote" of a line doc comment is (start of the next non-line-doc-comment token?).
    /// line 1
    /// line 2
    /// line 3
    struct S;

=>

{
    kind: OuterLineDocComment,
    data: ` line 1
    /// line 2
    /// line 3
    `
}
(the original token still can be losslessly recovered).

I would have thought it would be this:

{
    kind: OuterLineDocComment,
    data: "line 1\nline 2\nline 3"
}

Here data is the processed, useful form, and we can losslessly recover the original form by adding "/// " at the start and after each newline, and appending a newline at the end. (Well, I'm not sure about the newline at the end, but it definitely feels to me like the internal /// markers should be removed.)

nnethercote · 2019-10-23T19:31:03Z

Another incompatibility is with code doing something like this
// Passed to a macro
/// Comment 1
/// Comment 2

// Macro expects exactly two attributes
#[$meta:meta]
#[$meta:meta]

That's very unfortunate, if it means that /// multi-line doc comments and /** multi-line doc comments are distinguishable at the language level. Which definitely complicates things.

It's worth noting that if we converted every multi-line /// doc comment in liballoc/libcore/libstd to a /** ... */ doc comment, we'd likely get a big chunk of the potential improvement.

Mark-Simulacrum · 2019-10-23T19:39:17Z

It seems like one avenue is to have some crate-level flag or so (maybe just a feature) that collapses doc comments -- at least for now, it'd be basically internal-use-only but if that makes core/std smaller and faster to decode that's already a pretty good win. Eventually we could presumably edition-gate this change in some way (though the interaction with macros might make that a bit hard).

nnethercote · 2019-10-24T03:35:31Z

#65750 implements the idea from #60935, which is an alternative to doc comment coalescing that doesn't have any backwards compatibility concerns. After #65750 lands, the potential wins from coalescing would be much smaller, and IMO not worth pursuing.

@petrochenkov

Cheaper doc comments This PR implements the idea from #60935: represent doc comments more cheaply, rather than converting them into `#[doc="..."]` attribute form. Unlike #60936 (which is about coalescing doc comments to reduce their number), this approach does not have any backwards compatibility concerns, and it eliminates about 80-90% of the current cost of doc comments (as estimated using the numbers in #60930, which eliminated the cost of doc comments entirely by treating them as normal comments). r? @petrochenkov

petrochenkov mentioned this issue May 18, 2019

[experiment] How expensive are doc comments? #60930

Closed

Centril added I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. C-enhancement Category: An issue proposing an enhancement or a PR with one. labels May 18, 2019

jonas-schievink added I-compiletime Issue: Problems and improvements with respect to compile times. and removed I-slow Issue: Problems and improvements with respect to performance of generated code. labels May 18, 2019

petrochenkov mentioned this issue Jul 16, 2019

The essence of lexer #59706

Merged

petrochenkov mentioned this issue Oct 23, 2019

Document comment representation is inefficient #65714

Closed

nnethercote mentioned this issue Oct 24, 2019

Cheaper doc comments #65750

Merged

nnethercote closed this as completed Oct 24, 2019

Mark-Simulacrum mentioned this issue Apr 6, 2020

doc attribute removes whitespace #70732

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Coalescing for line doc comments #60936

Coalescing for line doc comments #60936

petrochenkov commented May 18, 2019

nagisa commented May 19, 2019

petrochenkov commented May 19, 2019

petrochenkov commented May 19, 2019

nagisa commented May 19, 2019 •

edited

Loading

petrochenkov commented May 20, 2019

nnethercote commented Oct 23, 2019

nnethercote commented Oct 23, 2019

Mark-Simulacrum commented Oct 23, 2019

nnethercote commented Oct 24, 2019

Coalescing for line doc comments #60936

Coalescing for line doc comments #60936

Comments

petrochenkov commented May 18, 2019

nagisa commented May 19, 2019

petrochenkov commented May 19, 2019

petrochenkov commented May 19, 2019

nagisa commented May 19, 2019 • edited Loading

petrochenkov commented May 20, 2019

nnethercote commented Oct 23, 2019

nnethercote commented Oct 23, 2019

Mark-Simulacrum commented Oct 23, 2019

nnethercote commented Oct 24, 2019

nagisa commented May 19, 2019 •

edited

Loading