-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize <SourceFile as Decodable>::decode
#95981
Conversation
By inverting parsing loop, avoiding continually re-checking bytes_per_diff.
compiler/rustc_span/src/lib.rs
Outdated
1 => { | ||
for _ in 1..num_lines { | ||
line_start = line_start + BytePos(d.read_u8() as u32); | ||
lines.push(line_start); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about using extend
with an iterator instead of push
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean like this?
lines.extend((1..num_lines).map(|_| {
line_start += BytePos(d.read_u8());
line_start
}));
It's harder to read but could be faster if it avoids bounds checks. @martingms , want to try it out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, like that 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided against it for readability, but I didn't benchmark it to see if it's faster, will do that now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed the commit here, if you wanted to do a CI perf run @nnethercote
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks promising locally, so worth a run! Thanks for the suggestion @Dandandan, I've still got a lot to learn about convincing the compiler to elide bounds checks.
compiler/rustc_span/src/lib.rs
Outdated
match bytes_per_diff { | ||
1 => { | ||
for _ in 1..num_lines { | ||
line_start = line_start + BytePos(d.read_u8() as u32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use +=
, here and below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BytePos
doesn't implement AddAssign
, I guess I could add that but seemed a bit out of scope, as the x = x + y
form was already used here in the old code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, yeah, don't worry about it.
This is a nice micro-optimization. Let's do a CI perf run, and then if you try out the @bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit 2b14529 with merge 366197b0f5f748175b0da2b052977587b09357e2... |
☀️ Try build successful - checks-actions |
Queued 366197b0f5f748175b0da2b052977587b09357e2 with parent 52ca603, future comparison URL. |
Finished benchmarking commit (366197b0f5f748175b0da2b052977587b09357e2): comparison url. Summary:
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Footnotes |
A bit less readable but more compact, and maybe faster? We'll see.
Perf run for the @bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit 5f2c6b9 with merge 4f8be8df41f2b23982956d0a6b4d1b07d7139f0f... |
☀️ Try build successful - checks-actions |
Queued 4f8be8df41f2b23982956d0a6b4d1b07d7139f0f with parent 1491e5c, future comparison URL. |
Finished benchmarking commit (4f8be8df41f2b23982956d0a6b4d1b07d7139f0f): comparison url. Summary:
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Footnotes |
The @bors r+ |
📌 Commit 5f2c6b9 has been approved by |
☀️ Test successful - checks-actions |
Finished benchmarking commit (f387c93): comparison url. Summary:
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. @rustbot label: -perf-regression Footnotes |
It showed up as a hot-ish function in a callgrind profile of the
await-call-tree
benchmark crate.Provides some moderate speedups to compilation of some of the smaller benchmarks:
Primary benchmarks
Secondary benchmarks
r? @nnethercote