Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow lazy HTML escaping #137274

Closed
wants to merge 4 commits into from

Conversation

yotamofek
Copy link
Contributor

@yotamofek yotamofek commented Feb 19, 2025

Inspired by this comment by @aDotInTheVoid .
Makes Escape and EscapeBodyText accept any impl fmt::Display, instead of a &str, which allows us to avoid a few interim String allocations.
This opens up room for more lazifying, but I'll keep those for a separate PR.

Unfortunately, I think there might be a hit to performance because of the double vtable-indirection caused by wrapping a fmt::Formatter in another one, but I think that I should be able to gain that perf back by doing more lazy printing (either the small things improvements I made in this PR, or in later ones)

Probably better to review each commit individually.

@rustbot
Copy link
Collaborator

rustbot commented Feb 19, 2025

r? @notriddle

rustbot has assigned @notriddle.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. T-rustdoc-frontend Relevant to the rustdoc-frontend team, which will review and decide on the web UI/UX output. labels Feb 19, 2025
@jieyouxu
Copy link
Member

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 19, 2025
bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 19, 2025
…<try>

Allow lazy HTML escaping

Inspired by [this comment](rust-lang#136828 (comment)) by `@aDotInTheVoid` .
Makes `Escape` and `EscapeBodyText` accept any `impl fmt::Display`, instead of a `&str`, which allows us to avoid a few interim `String` allocations.
This opens up room for more lazifying, but I'll keep those for a separate PR.

Unfortunately, I think there might be a hit to performance because of the double vtable-indirection caused by wrapping a `fmt::Formatter` in another one, but I think that I should be able to gain that perf back by doing more lazy printing (either the small things improvements I made in this PR, or in later ones)

Probably better to review each commit individually.
@bors
Copy link
Contributor

bors commented Feb 19, 2025

⌛ Trying commit c7f0154 with merge bfce716...

@rust-log-analyzer

This comment has been minimized.

@jieyouxu jieyouxu added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 19, 2025
@yotamofek yotamofek force-pushed the pr/rustdoc/lazy-escape branch from c7f0154 to 1dd2f33 Compare February 19, 2025 15:13
@notriddle
Copy link
Contributor

I've got an idea that seems simpler and better.

pulldown-cmark, the Markdown parser that we already use, makes its HTML escaper available as a library (and a transitive dependency of rustdoc):

It already writes its escaped output into a Writer. Could rustdoc use that?

@yotamofek
Copy link
Contributor Author

Didn't know that crate, thanks for pointing that out!
I think it might be a good building block for own escape logic, though it does seem to escape more characters than we do (at first glance, didn't dive into their code).

Alas, they don't support the "lazy" formatting I've implemented here, their API is used by calling escape_href, which takes a &str, and not an impl fmt::Display as I'm doing here.

So yeah, maybe we can replace our escape function with their escape_href, if functionality-wise they're the same, but we'll still need our Escape construct, along with the changes I made here, in order to be able to escape stuff without having to allocate it into a String first.

@yotamofek
Copy link
Contributor Author

Their API is kinda opposite of ours: they have a function that allows writing a string into a writer, we have a wrapper for a string so that it can be escaped when written into a fmt::Write.

@jieyouxu
Copy link
Member

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Feb 19, 2025

⌛ Trying commit 1dd2f33 with merge 69f611a...

bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 19, 2025
…<try>

Allow lazy HTML escaping

Inspired by [this comment](rust-lang#136828 (comment)) by `@aDotInTheVoid` .
Makes `Escape` and `EscapeBodyText` accept any `impl fmt::Display`, instead of a `&str`, which allows us to avoid a few interim `String` allocations.
This opens up room for more lazifying, but I'll keep those for a separate PR.

Unfortunately, I think there might be a hit to performance because of the double vtable-indirection caused by wrapping a `fmt::Formatter` in another one, but I think that I should be able to gain that perf back by doing more lazy printing (either the small things improvements I made in this PR, or in later ones)

Probably better to review each commit individually.
@notriddle
Copy link
Contributor

So yeah, maybe we can replace our escape function with their escape_href, if functionality-wise they're the same, but we'll still need our Escape construct, along with the changes I made here, in order to be able to escape stuff without having to allocate it into a String first.

I see. That makes sense.

(Anyway, you'd use escape_html, not escape_href. The escape_href function does url percent-encoding.)

@yotamofek
Copy link
Contributor Author

(Anyway, you'd use escape_html, not escape_href. The escape_href function does url percent-encoding.)

That explains everything haha.
I'll try to replace our escape functions with theirs in a separate PR. The SIMD implementation looks very promising (though it's a shame it doesn't seem to cache the check for whether the CPU supports SSE3). And being able to delete our copy of the code is a clear win.

@bors
Copy link
Contributor

bors commented Feb 19, 2025

☀️ Try build successful - checks-actions
Build commit: 69f611a (69f611a46c06f1d237cdefe8257d35fc5cc99cf6)

@rust-timer

This comment has been minimized.

bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 19, 2025
…ng, r=<try>

librustdoc: Use `pulldown-cmark-escape` for HTML escaping

Implementation of `@notriddle` 's [suggestion](rust-lang#137274 (comment)).
Somewhat related to rust-lang#137274 , but the two PRs should be complementary.

Local perf results look like a nice improvement! (so would love a perf run on the CI)
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (69f611a): comparison URL.

Overall result: ❌ regressions - please read the text below

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

mean range count
Regressions ❌
(primary)
0.4% [0.2%, 1.1%] 15
Regressions ❌
(secondary)
0.7% [0.2%, 1.1%] 8
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.4% [0.2%, 1.1%] 15

Max RSS (memory usage)

Results (primary 2.4%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.4% [2.3%, 2.6%] 2
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 2.4% [2.3%, 2.6%] 2

Cycles

Results (primary -4.3%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-4.3% [-6.1%, -1.8%] 6
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -4.3% [-6.1%, -1.8%] 6

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 773.063s -> 773.308s (0.03%)
Artifact size: 362.34 MiB -> 362.33 MiB (-0.00%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Feb 19, 2025
@yotamofek
Copy link
Contributor Author

Perf regression is unfortunate, but not surprising. I'm hoping it's acceptable for now, I think I can win it back with #137285 and future PRs.

@notriddle
Copy link
Contributor

I'd like to see some of the planned improvements added to this PR.

Then we can do another perf run and see if they make up the difference.

@yotamofek
Copy link
Contributor Author

Spent quite a while trying to mitigate the perf regression, to no avail.
Gonna close this for now and hopefully come back to it sometime in the future.

@yotamofek yotamofek closed this Feb 22, 2025
bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 24, 2025
…ng, r=GuillaumeGomez

librustdoc: Use `pulldown-cmark-escape` for HTML escaping

Implementation of `@notriddle` 's [suggestion](rust-lang#137274 (comment)).
Somewhat related to rust-lang#137274 , but the two PRs should be complementary.

Local perf results look like a nice improvement! (so would love a perf run on the CI)
RalfJung pushed a commit to RalfJung/miri that referenced this pull request Feb 25, 2025
…llaumeGomez

librustdoc: Use `pulldown-cmark-escape` for HTML escaping

Implementation of `@notriddle` 's [suggestion](rust-lang/rust#137274 (comment)).
Somewhat related to #137274 , but the two PRs should be complementary.

Local perf results look like a nice improvement! (so would love a perf run on the CI)
BoxyUwU pushed a commit to BoxyUwU/rustc-dev-guide that referenced this pull request Feb 25, 2025
…llaumeGomez

librustdoc: Use `pulldown-cmark-escape` for HTML escaping

Implementation of `@notriddle` 's [suggestion](rust-lang/rust#137274 (comment)).
Somewhat related to #137274 , but the two PRs should be complementary.

Local perf results look like a nice improvement! (so would love a perf run on the CI)
lnicola pushed a commit to lnicola/rust-analyzer that referenced this pull request Mar 3, 2025
…llaumeGomez

librustdoc: Use `pulldown-cmark-escape` for HTML escaping

Implementation of `@notriddle` 's [suggestion](rust-lang/rust#137274 (comment)).
Somewhat related to #137274 , but the two PRs should be complementary.

Local perf results look like a nice improvement! (so would love a perf run on the CI)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
perf-regression Performance regression. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. T-rustdoc-frontend Relevant to the rustdoc-frontend team, which will review and decide on the web UI/UX output.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants