Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Optimize String::join #92550

Merged
merged 1 commit into from
Aug 16, 2024
Merged

Conversation

AThousandShips
Copy link
Member

@AThousandShips AThousandShips commented May 30, 2024


bool first = true;
for (const String &part : parts) {
if (first) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does compiler optimize this out? If not, taking it out of the loop might be an option for an ounce more (have to profile though). 🙂

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume it'll unroll it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah what is it they say about assumption.. 😉

It's only a minor thing anyway but ime optimizers don't always make the decisions you expect, as they don't have knowledge of the data, they can't assume you are feeding a list of zero items, 1 or 2, or 10000. Good practice to look at the assembly or look in godbolt (or just profile and see what is going on).

Having said that it should be an improvement so far anyway so happy to approve.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested a simple case with gcc and it unrolls a loop like this one, by jumping past the check and then looping back around, like this for a trivial case:

        cmpq    %rbx, %rsi
        jne     .L6
        jmp     .L1
        .p2align 4,,10
        .p2align 3
.L8:
        movl    (%rbx), %ecx
        call    _Z11call_me_tooi
.L6:
        movl    (%rbx), %ecx
        addq    $4, %rbx
        call    _Z7call_mei
        cmpq    %rbx, %rsi
        jne     .L8
.L1:
        addq    $40, %rsp
        popq    %rbx
        popq    %rsi
        ret

This is from this code:

void do_this(const std::vector<int> &p_arg) {
    bool first = true;
    for (const int &i : p_arg) {
        if (first) {
            first = false;
        } else {
            call_me_too(i);
        }
        call_me(i);
    }
}

So it will jump to L6 which is the call_me line, and then next time it loops it'll go to call_me_too

@Mickeon
Copy link
Contributor

Mickeon commented Jun 23, 2024

For the extra code added, how much faster is the method in this PR in comparison?

@AThousandShips
Copy link
Member Author

AThousandShips commented Jun 23, 2024

Haven't done explicit checking with this one as benchmarking it is hard, but it should pretty non-trivial since it skips a lot of allocations for long code

Our current benchmarking tools doesn't cover this and doing separate benchmarking with non-library code wouldn't be reliable as it can't cover the COW stuff easily

Copy link
Member

@Calinou Calinou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally (rebased on top of master c73ac74), it works as expected.

Testing project: test_pr_92550.zip

Benchmark

PC specifications
  • CPU: Intel Core i9-13900K
  • GPU: NVIDIA GeForce RTX 4090
  • RAM: 64 GB (2×32 GB DDR5-5800 C30)
  • SSD: Solidigm P44 Pro 2 TB
  • OS: Linux (Fedora 40)

Using an optimized editor build (optimize=speed lto=full).

Before

join(2 strings) took 3155335 usecs
join(10 strings) took 7880363 usecs

After

join(2 strings) took 2511176 usecs
join(10 strings) took 5855268 usecs

@clayjohn clayjohn modified the milestones: 4.x, 4.4 Aug 9, 2024
Avoid reallocation by pre-computing size
@akien-mga akien-mga merged commit e057c49 into godotengine:master Aug 16, 2024
18 checks passed
@AThousandShips AThousandShips deleted the join_improve branch August 16, 2024 08:48
@akien-mga
Copy link
Member

Thanks! String ops go brrr 🚗

@AThousandShips
Copy link
Member Author

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants