-
-
Notifications
You must be signed in to change notification settings - Fork 21.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] Optimize String::join
#92550
[Core] Optimize String::join
#92550
Conversation
|
||
bool first = true; | ||
for (const String &part : parts) { | ||
if (first) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does compiler optimize this out? If not, taking it out of the loop might be an option for an ounce more (have to profile though). 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume it'll unroll it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah what is it they say about assumption.. 😉
It's only a minor thing anyway but ime optimizers don't always make the decisions you expect, as they don't have knowledge of the data, they can't assume you are feeding a list of zero items, 1 or 2, or 10000. Good practice to look at the assembly or look in godbolt (or just profile and see what is going on).
Having said that it should be an improvement so far anyway so happy to approve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested a simple case with gcc and it unrolls a loop like this one, by jumping past the check and then looping back around, like this for a trivial case:
cmpq %rbx, %rsi
jne .L6
jmp .L1
.p2align 4,,10
.p2align 3
.L8:
movl (%rbx), %ecx
call _Z11call_me_tooi
.L6:
movl (%rbx), %ecx
addq $4, %rbx
call _Z7call_mei
cmpq %rbx, %rsi
jne .L8
.L1:
addq $40, %rsp
popq %rbx
popq %rsi
ret
This is from this code:
void do_this(const std::vector<int> &p_arg) {
bool first = true;
for (const int &i : p_arg) {
if (first) {
first = false;
} else {
call_me_too(i);
}
call_me(i);
}
}
So it will jump to L6 which is the call_me
line, and then next time it loops it'll go to call_me_too
For the extra code added, how much faster is the method in this PR in comparison? |
Haven't done explicit checking with this one as benchmarking it is hard, but it should pretty non-trivial since it skips a lot of allocations for long code Our current benchmarking tools doesn't cover this and doing separate benchmarking with non-library code wouldn't be reliable as it can't cover the COW stuff easily |
485441c
to
819a725
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested locally (rebased on top of master
c73ac74), it works as expected.
Testing project: test_pr_92550.zip
Benchmark
PC specifications
- CPU: Intel Core i9-13900K
- GPU: NVIDIA GeForce RTX 4090
- RAM: 64 GB (2×32 GB DDR5-5800 C30)
- SSD: Solidigm P44 Pro 2 TB
- OS: Linux (Fedora 40)
Using an optimized editor build (optimize=speed lto=full
).
Before
join(2 strings) took 3155335 usecs
join(10 strings) took 7880363 usecs
After
join(2 strings) took 2511176 usecs
join(10 strings) took 5855268 usecs
Avoid reallocation by pre-computing size
819a725
to
e211d08
Compare
Thanks! String ops go brrr 🚗 |
Thank you! |
Avoid reallocation by pre-computing size
See also:
String::replace
methods #92546String
methods #92548