Optimize `<SourceFile as Decodable>::decode` #95981

martingms · 2022-04-12T15:31:44Z

It showed up as a hot-ish function in a callgrind profile of the await-call-tree benchmark crate.

Provides some moderate speedups to compilation of some of the smaller benchmarks:

Primary benchmarks

Benchmark	Profile	Scenario	% Change	Significance Factor?
helloworld	check	full	-1.81%	9.03x
helloworld	check	incr-unchanged	-1.80%	8.99x
helloworld	check	incr-full	-1.59%	7.97x
helloworld	check	incr-patched: println	-1.57%	7.86x

Secondary benchmarks

Benchmark	Profile	Scenario	% Change	Significance Factor?
unify-linearly	check	incr-unchanged	-1.55%	7.74x
unify-linearly	check	incr-patched: dummy fn	-1.42%	7.08x
await-call-tree	check	incr-unchanged	-1.27%	6.35x
await-call-tree	debug	incr-unchanged	-1.19%	5.95x
await-call-tree	opt	incr-unchanged	-1.19%	5.94x
issue-46449	check	incr-unchanged	-1.08%	5.39x
issue-46449	check	incr-patched: u8 3072	-1.00%	5.00x
structopt-0.3.26	check	incr-unchanged	-0.94%	4.72x
structopt-0.3.26	opt	incr-unchanged	-0.92%	4.60x
structopt-0.3.26	debug	incr-unchanged	-0.92%	4.59x
issue-46449	check	full	-0.89%	4.46x
structopt-0.3.26	check	full	-0.83%	4.17x
structopt-0.3.26	debug	full	-0.78%	3.88x
structopt-0.3.26	opt	full	-0.76%	3.81x
unify-linearly	check	full	-0.75%	3.74x
projection-caching	check	incr-unchanged	-0.74%	3.70x
issue-46449	check	incr-patched: u32 3072	-0.70%	3.50x
issue-46449	check	incr-patched: empty 3072	-0.68%	3.41x
structopt-0.3.26	check	incr-full	-0.68%	3.40x
wf-projection-stress-65510	check	incr-unchanged	-0.68%	3.39x
issue-46449	check	incr-patched: static str 6144	-0.67%	3.37x
wf-projection-stress-65510	debug	incr-unchanged	-0.67%	3.33x
wf-projection-stress-65510	opt	incr-unchanged	-0.66%	3.31x
issue-46449	check	incr-patched: io error 6144	-0.66%	3.29x
unify-linearly	check	incr-full	-0.65%	3.26x
issue-46449	check	incr-full	-0.65%	3.25x
structopt-0.3.26	debug	incr-full	-0.64%	3.18x
structopt-0.3.26	opt	incr-full	-0.63%	3.17x
issue-46449	debug	incr-unchanged	-0.61%	3.06x
issue-46449	opt	incr-unchanged	-0.61%	3.03x
await-call-tree	check	full	-0.60%	2.99x
issue-88862	check	incr-unchanged	-0.58%	2.91x
deep-vector	debug	full	0.57%	2.83x
await-call-tree	check	incr-full	-0.52%	2.59x
tt-muncher	opt	full	-0.52%	2.58x
issue-58319	check	incr-unchanged	-0.50%	2.52x
await-call-tree	debug	full	-0.50%	2.49x
await-call-tree	opt	full	-0.49%	2.45x
deep-vector	debug	incr-patched: println	0.47%	2.37x
await-call-tree	debug	incr-full	-0.45%	2.26x
await-call-tree	opt	incr-full	-0.44%	2.18x
issue-88862	check	full	-0.41%	2.06x
mockall-0.11.0	check	incr-unchanged	-0.38%	1.90x
regression-31157	check	incr-unchanged	-0.37%	1.86x
wf-projection-stress-65510	opt	full	-0.36%	1.80x
deunicode-1.3.1	check	incr-unchanged	-0.35%	1.76x
unify-linearly	debug	incr-patched: dummy fn	-0.35%	1.74x
mockall-0.11.0	check	full	-0.35%	1.73x
unify-linearly	debug	incr-unchanged	-0.34%	1.69x
deunicode-1.3.1	check	full	-0.33%	1.63x
token-stream-stress	check	full	-0.32%	1.62x
token-stream-stress	check	incr-full	-0.32%	1.59x
token-stream-stress	check	incr-unchanged	-0.32%	1.59x
regression-31157	check	incr-patched: println	-0.31%	1.57x
wf-projection-stress-65510	check	full	-0.31%	1.54x
deeply-nested-multi	check	incr-unchanged	-0.31%	1.53x
mockall-0.11.0	opt	incr-unchanged	-0.30%	1.50x

r? @nnethercote

By inverting parsing loop, avoiding continually re-checking bytes_per_diff.

Dandandan · 2022-04-12T22:04:03Z

compiler/rustc_span/src/lib.rs

+                    1 => {
+                        for _ in 1..num_lines {
+                            line_start = line_start + BytePos(d.read_u8() as u32);
+                            lines.push(line_start);


What about using extend with an iterator instead of push?

You mean like this?

lines.extend((1..num_lines).map(|_| { line_start += BytePos(d.read_u8()); line_start }));

It's harder to read but could be faster if it avoids bounds checks. @martingms , want to try it out?

Yes, like that 👍

I decided against it for readability, but I didn't benchmark it to see if it's faster, will do that now.

Pushed the commit here, if you wanted to do a CI perf run @nnethercote

Looks promising locally, so worth a run! Thanks for the suggestion @Dandandan, I've still got a lot to learn about convincing the compiler to elide bounds checks.

nnethercote · 2022-04-13T01:08:23Z

compiler/rustc_span/src/lib.rs

+                match bytes_per_diff {
+                    1 => {
+                        for _ in 1..num_lines {
+                            line_start = line_start + BytePos(d.read_u8() as u32);


Use +=, here and below.

BytePos doesn't implement AddAssign, I guess I could add that but seemed a bit out of scope, as the x = x + y form was already used here in the old code.

Oh right, yeah, don't worry about it.

nnethercote · 2022-04-13T01:12:11Z

This is a nice micro-optimization. Let's do a CI perf run, and then if you try out the extend idea from above we can do another run. Note that local runs sometimes give somewhat different results because they don't have PGO, so it's probably worth trying the extend thing on CI even if the local results seem unpromising.

@bors try @rust-timer queue

rust-timer · 2022-04-13T01:12:13Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2022-04-13T01:12:20Z

⌛ Trying commit 2b14529 with merge 366197b0f5f748175b0da2b052977587b09357e2...

bors · 2022-04-13T02:36:43Z

☀️ Try build successful - checks-actions
Build commit: 366197b0f5f748175b0da2b052977587b09357e2 (366197b0f5f748175b0da2b052977587b09357e2)

rust-timer · 2022-04-13T02:36:45Z

Queued 366197b0f5f748175b0da2b052977587b09357e2 with parent 52ca603, future comparison URL.

rust-timer · 2022-04-13T03:50:03Z

Finished benchmarking commit (366197b0f5f748175b0da2b052977587b09357e2): comparison url.

Summary:

Primary benchmarks: 🎉 relevant improvements found
Secondary benchmarks: 🎉 relevant improvements found

	Regressions 😿 (primary)	Regressions 😿 (secondary)	Improvements 🎉 (primary)	Improvements 🎉 (secondary)	All 😿 🎉 (primary)
count¹	0	0	7	54	7
mean²	N/A	N/A	-2.1%	-1.3%	-2.1%
max	N/A	N/A	-3.5%	-3.2%	-3.5%

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

number of relevant changes ↩
the arithmetic mean of the percent change ↩

A bit less readable but more compact, and maybe faster? We'll see.

nnethercote · 2022-04-13T07:17:45Z

Perf run for the extend version:

@bors try @rust-timer queue

rust-timer · 2022-04-13T07:17:47Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2022-04-13T07:17:59Z

⌛ Trying commit 5f2c6b9 with merge 4f8be8df41f2b23982956d0a6b4d1b07d7139f0f...

bors · 2022-04-13T09:08:22Z

☀️ Try build successful - checks-actions
Build commit: 4f8be8df41f2b23982956d0a6b4d1b07d7139f0f (4f8be8df41f2b23982956d0a6b4d1b07d7139f0f)

rust-timer · 2022-04-13T09:08:23Z

Queued 4f8be8df41f2b23982956d0a6b4d1b07d7139f0f with parent 1491e5c, future comparison URL.

rust-timer · 2022-04-13T10:22:24Z

Finished benchmarking commit (4f8be8df41f2b23982956d0a6b4d1b07d7139f0f): comparison url.

Summary:

Primary benchmarks: 🎉 relevant improvements found
Secondary benchmarks: 🎉 relevant improvements found

	Regressions 😿 (primary)	Regressions 😿 (secondary)	Improvements 🎉 (primary)	Improvements 🎉 (secondary)	All 😿 🎉 (primary)
count¹	0	6	26	68	26
mean²	N/A	0.5%	-1.0%	-1.4%	-1.0%
max	N/A	0.8%	-4.5%	-4.1%	-4.5%

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

number of relevant changes ↩
the arithmetic mean of the percent change ↩

nnethercote · 2022-04-13T11:08:21Z

The extend version is clearly better. @Dandandan: nice suggestion. @martingms: great PR! Some very nice wins for such a small change.

@bors r+

bors · 2022-04-13T11:08:22Z

📌 Commit 5f2c6b9 has been approved by nnethercote

bors · 2022-04-13T23:18:37Z

⌛ Testing commit 5f2c6b9 with merge f387c93...

bors · 2022-04-14T01:59:13Z

☀️ Test successful - checks-actions
Approved by: nnethercote
Pushing f387c93 to master...

rust-timer · 2022-04-14T04:24:38Z

Finished benchmarking commit (f387c93): comparison url.

Summary:

Primary benchmarks: 🎉 relevant improvements found
Secondary benchmarks: 🎉 relevant improvements found

	Regressions 😿 (primary)	Regressions 😿 (secondary)	Improvements 🎉 (primary)	Improvements 🎉 (secondary)	All 😿 🎉 (primary)
count¹	0	1	16	76	16
mean²	N/A	0.9%	-1.5%	-1.5%	-1.5%
max	N/A	0.9%	-5.1%	-4.5%	-5.1%

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

@rustbot label: -perf-regression

number of relevant changes ↩
the arithmetic mean of the percent change ↩

Optimize line offset parsing in <SourceFile as Decodable>::decode

2b14529

By inverting parsing loop, avoiding continually re-checking bytes_per_diff.

rust-highfive assigned nnethercote Apr 12, 2022

rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Apr 12, 2022

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Apr 12, 2022

Dandandan reviewed Apr 12, 2022

View reviewed changes

nnethercote reviewed Apr 13, 2022

View reviewed changes

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 13, 2022

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 13, 2022

Use .extend(..) instead of push()-ing in loop

5f2c6b9

A bit less readable but more compact, and maybe faster? We'll see.

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 13, 2022

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 13, 2022

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 13, 2022

bors added the merged-by-bors This PR was explicitly merged by bors. label Apr 14, 2022

bors merged commit f387c93 into rust-lang:master Apr 14, 2022

rustbot added this to the 1.62.0 milestone Apr 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize `<SourceFile as Decodable>::decode` #95981

Optimize `<SourceFile as Decodable>::decode` #95981

martingms commented Apr 12, 2022

Dandandan Apr 12, 2022

nnethercote Apr 13, 2022

Dandandan Apr 13, 2022

martingms Apr 13, 2022

martingms Apr 13, 2022

martingms Apr 13, 2022

nnethercote Apr 13, 2022

martingms Apr 13, 2022

nnethercote Apr 13, 2022

nnethercote commented Apr 13, 2022

rust-timer commented Apr 13, 2022

bors commented Apr 13, 2022

bors commented Apr 13, 2022

rust-timer commented Apr 13, 2022

rust-timer commented Apr 13, 2022

nnethercote commented Apr 13, 2022

rust-timer commented Apr 13, 2022

bors commented Apr 13, 2022

bors commented Apr 13, 2022

rust-timer commented Apr 13, 2022

rust-timer commented Apr 13, 2022

nnethercote commented Apr 13, 2022

bors commented Apr 13, 2022

bors commented Apr 13, 2022

bors commented Apr 14, 2022

rust-timer commented Apr 14, 2022

Optimize <SourceFile as Decodable>::decode #95981

Optimize <SourceFile as Decodable>::decode #95981

Conversation

martingms commented Apr 12, 2022

Primary benchmarks

Secondary benchmarks

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nnethercote commented Apr 13, 2022

rust-timer commented Apr 13, 2022

bors commented Apr 13, 2022

bors commented Apr 13, 2022

rust-timer commented Apr 13, 2022

rust-timer commented Apr 13, 2022

Footnotes

nnethercote commented Apr 13, 2022

rust-timer commented Apr 13, 2022

bors commented Apr 13, 2022

bors commented Apr 13, 2022

rust-timer commented Apr 13, 2022

rust-timer commented Apr 13, 2022

Footnotes

nnethercote commented Apr 13, 2022

bors commented Apr 13, 2022

bors commented Apr 13, 2022

bors commented Apr 14, 2022

rust-timer commented Apr 14, 2022

Footnotes

Optimize `<SourceFile as Decodable>::decode` #95981

Optimize `<SourceFile as Decodable>::decode` #95981