Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize <SourceFile as Decodable>::decode #95981

Merged
merged 2 commits into from
Apr 14, 2022

Conversation

martingms
Copy link
Contributor

It showed up as a hot-ish function in a callgrind profile of the await-call-tree benchmark crate.

Provides some moderate speedups to compilation of some of the smaller benchmarks:

Primary benchmarks

Benchmark Profile Scenario % Change Significance Factor?
helloworld check full -1.81% 9.03x
helloworld check incr-unchanged -1.80% 8.99x
helloworld check incr-full -1.59% 7.97x
helloworld check incr-patched: println -1.57% 7.86x

Secondary benchmarks

Benchmark Profile Scenario % Change Significance Factor?
unify-linearly check incr-unchanged -1.55% 7.74x
unify-linearly check incr-patched: dummy fn -1.42% 7.08x
await-call-tree check incr-unchanged -1.27% 6.35x
await-call-tree debug incr-unchanged -1.19% 5.95x
await-call-tree opt incr-unchanged -1.19% 5.94x
issue-46449 check incr-unchanged -1.08% 5.39x
issue-46449 check incr-patched: u8 3072 -1.00% 5.00x
structopt-0.3.26 check incr-unchanged -0.94% 4.72x
structopt-0.3.26 opt incr-unchanged -0.92% 4.60x
structopt-0.3.26 debug incr-unchanged -0.92% 4.59x
issue-46449 check full -0.89% 4.46x
structopt-0.3.26 check full -0.83% 4.17x
structopt-0.3.26 debug full -0.78% 3.88x
structopt-0.3.26 opt full -0.76% 3.81x
unify-linearly check full -0.75% 3.74x
projection-caching check incr-unchanged -0.74% 3.70x
issue-46449 check incr-patched: u32 3072 -0.70% 3.50x
issue-46449 check incr-patched: empty 3072 -0.68% 3.41x
structopt-0.3.26 check incr-full -0.68% 3.40x
wf-projection-stress-65510 check incr-unchanged -0.68% 3.39x
issue-46449 check incr-patched: static str 6144 -0.67% 3.37x
wf-projection-stress-65510 debug incr-unchanged -0.67% 3.33x
wf-projection-stress-65510 opt incr-unchanged -0.66% 3.31x
issue-46449 check incr-patched: io error 6144 -0.66% 3.29x
unify-linearly check incr-full -0.65% 3.26x
issue-46449 check incr-full -0.65% 3.25x
structopt-0.3.26 debug incr-full -0.64% 3.18x
structopt-0.3.26 opt incr-full -0.63% 3.17x
issue-46449 debug incr-unchanged -0.61% 3.06x
issue-46449 opt incr-unchanged -0.61% 3.03x
await-call-tree check full -0.60% 2.99x
issue-88862 check incr-unchanged -0.58% 2.91x
deep-vector debug full 0.57% 2.83x
await-call-tree check incr-full -0.52% 2.59x
tt-muncher opt full -0.52% 2.58x
issue-58319 check incr-unchanged -0.50% 2.52x
await-call-tree debug full -0.50% 2.49x
await-call-tree opt full -0.49% 2.45x
deep-vector debug incr-patched: println 0.47% 2.37x
await-call-tree debug incr-full -0.45% 2.26x
await-call-tree opt incr-full -0.44% 2.18x
issue-88862 check full -0.41% 2.06x
mockall-0.11.0 check incr-unchanged -0.38% 1.90x
regression-31157 check incr-unchanged -0.37% 1.86x
wf-projection-stress-65510 opt full -0.36% 1.80x
deunicode-1.3.1 check incr-unchanged -0.35% 1.76x
unify-linearly debug incr-patched: dummy fn -0.35% 1.74x
mockall-0.11.0 check full -0.35% 1.73x
unify-linearly debug incr-unchanged -0.34% 1.69x
deunicode-1.3.1 check full -0.33% 1.63x
token-stream-stress check full -0.32% 1.62x
token-stream-stress check incr-full -0.32% 1.59x
token-stream-stress check incr-unchanged -0.32% 1.59x
regression-31157 check incr-patched: println -0.31% 1.57x
wf-projection-stress-65510 check full -0.31% 1.54x
deeply-nested-multi check incr-unchanged -0.31% 1.53x
mockall-0.11.0 opt incr-unchanged -0.30% 1.50x

r? @nnethercote

By inverting parsing loop, avoiding continually re-checking bytes_per_diff.
@rustbot rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Apr 12, 2022
@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Apr 12, 2022
1 => {
for _ in 1..num_lines {
line_start = line_start + BytePos(d.read_u8() as u32);
lines.push(line_start);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about using extend with an iterator instead of push?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean like this?

lines.extend((1..num_lines).map(|_| {
    line_start += BytePos(d.read_u8());
    line_start
}));

It's harder to read but could be faster if it avoids bounds checks. @martingms , want to try it out?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, like that 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided against it for readability, but I didn't benchmark it to see if it's faster, will do that now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed the commit here, if you wanted to do a CI perf run @nnethercote

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks promising locally, so worth a run! Thanks for the suggestion @Dandandan, I've still got a lot to learn about convincing the compiler to elide bounds checks.

match bytes_per_diff {
1 => {
for _ in 1..num_lines {
line_start = line_start + BytePos(d.read_u8() as u32);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use +=, here and below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BytePos doesn't implement AddAssign, I guess I could add that but seemed a bit out of scope, as the x = x + y form was already used here in the old code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, yeah, don't worry about it.

@nnethercote
Copy link
Contributor

This is a nice micro-optimization. Let's do a CI perf run, and then if you try out the extend idea from above we can do another run. Note that local runs sometimes give somewhat different results because they don't have PGO, so it's probably worth trying the extend thing on CI even if the local results seem unpromising.

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 13, 2022
@bors
Copy link
Contributor

bors commented Apr 13, 2022

⌛ Trying commit 2b14529 with merge 366197b0f5f748175b0da2b052977587b09357e2...

@bors
Copy link
Contributor

bors commented Apr 13, 2022

☀️ Try build successful - checks-actions
Build commit: 366197b0f5f748175b0da2b052977587b09357e2 (366197b0f5f748175b0da2b052977587b09357e2)

@rust-timer
Copy link
Collaborator

Queued 366197b0f5f748175b0da2b052977587b09357e2 with parent 52ca603, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (366197b0f5f748175b0da2b052977587b09357e2): comparison url.

Summary:

  • Primary benchmarks: 🎉 relevant improvements found
  • Secondary benchmarks: 🎉 relevant improvements found
Regressions 😿
(primary)
Regressions 😿
(secondary)
Improvements 🎉
(primary)
Improvements 🎉
(secondary)
All 😿 🎉
(primary)
count1 0 0 7 54 7
mean2 N/A N/A -2.1% -1.3% -2.1%
max N/A N/A -3.5% -3.2% -3.5%

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

Footnotes

  1. number of relevant changes

  2. the arithmetic mean of the percent change

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 13, 2022
A bit less readable but more compact, and maybe faster? We'll see.
@nnethercote
Copy link
Contributor

Perf run for the extend version:

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 13, 2022
@bors
Copy link
Contributor

bors commented Apr 13, 2022

⌛ Trying commit 5f2c6b9 with merge 4f8be8df41f2b23982956d0a6b4d1b07d7139f0f...

@bors
Copy link
Contributor

bors commented Apr 13, 2022

☀️ Try build successful - checks-actions
Build commit: 4f8be8df41f2b23982956d0a6b4d1b07d7139f0f (4f8be8df41f2b23982956d0a6b4d1b07d7139f0f)

@rust-timer
Copy link
Collaborator

Queued 4f8be8df41f2b23982956d0a6b4d1b07d7139f0f with parent 1491e5c, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (4f8be8df41f2b23982956d0a6b4d1b07d7139f0f): comparison url.

Summary:

  • Primary benchmarks: 🎉 relevant improvements found
  • Secondary benchmarks: 🎉 relevant improvements found
Regressions 😿
(primary)
Regressions 😿
(secondary)
Improvements 🎉
(primary)
Improvements 🎉
(secondary)
All 😿 🎉
(primary)
count1 0 6 26 68 26
mean2 N/A 0.5% -1.0% -1.4% -1.0%
max N/A 0.8% -4.5% -4.1% -4.5%

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

Footnotes

  1. number of relevant changes

  2. the arithmetic mean of the percent change

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 13, 2022
@nnethercote
Copy link
Contributor

The extend version is clearly better. @Dandandan: nice suggestion. @martingms: great PR! Some very nice wins for such a small change.

@bors r+

@bors
Copy link
Contributor

bors commented Apr 13, 2022

📌 Commit 5f2c6b9 has been approved by nnethercote

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 13, 2022
@bors
Copy link
Contributor

bors commented Apr 13, 2022

⌛ Testing commit 5f2c6b9 with merge f387c93...

@bors
Copy link
Contributor

bors commented Apr 14, 2022

☀️ Test successful - checks-actions
Approved by: nnethercote
Pushing f387c93 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Apr 14, 2022
@bors bors merged commit f387c93 into rust-lang:master Apr 14, 2022
@rustbot rustbot added this to the 1.62.0 milestone Apr 14, 2022
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (f387c93): comparison url.

Summary:

  • Primary benchmarks: 🎉 relevant improvements found
  • Secondary benchmarks: 🎉 relevant improvements found
Regressions 😿
(primary)
Regressions 😿
(secondary)
Improvements 🎉
(primary)
Improvements 🎉
(secondary)
All 😿 🎉
(primary)
count1 0 1 16 76 16
mean2 N/A 0.9% -1.5% -1.5% -1.5%
max N/A 0.9% -5.1% -4.5% -5.1%

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

@rustbot label: -perf-regression

Footnotes

  1. number of relevant changes

  2. the arithmetic mean of the percent change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants