Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add const char * overloads to String class #84379

Merged
merged 1 commit into from
May 7, 2024

Conversation

Rubonnek
Copy link
Member

@Rubonnek Rubonnek commented Nov 2, 2023

The added function overloads, namely:

int _count(const char *p_string, int p_from, int p_to, bool p_case_insensitive) const;
int rfind(const char *p_str, int p_from = -1) const;
int rfindn(const char *p_str, int p_from = -1) const;
String replace_first(const char *p_key, const char *p_with) const;
String replacen(const char *p_key, const char *p_with) const;
int count(const char *p_string, int p_from = 0, int p_to = 0) const;
int countn(const char *p_string, int p_from = 0, int p_to = 0) const;

help avoid creating a ton of String objects and the underlying dynamic memory allocations they require by using the C-strings directly when possible.

I personally find that this patch makes the Editor feel snappier, but I have not run any benchmarks yet.

Edit: Added the following overloads:

int String::findn(const char *p_str, int p_from)
bool ends_with(const char *p_string) const;
String trim_prefix(const char *p_prefix) const;
String trim_suffix(const char *p_suffix) const;
int get_slice_count(const char *p_splitter) const;
String get_slice(const char *p_splitter, int p_slice) const;
Vector<String> split(const char *p_splitter = "", bool p_allow_empty = true, int p_maxsplit = 0) const;
Vector<String> rsplit(const char *p_splitter = "", bool p_allow_empty = true, int p_maxsplit = 0) const;

Edit: In #84379 (comment) @miv391 pointed out that finding an empty string ("") through String::find(const char *p_str, int p_from) returns 0 which is inconsistent when doing the same through GDScript. I've updated this PR to also fix that pre existing bug since it should return -1.

@Rubonnek Rubonnek requested a review from a team as a code owner November 2, 2023 19:45
@AThousandShips AThousandShips added this to the 4.x milestone Nov 2, 2023
core/string/ustring.cpp Outdated Show resolved Hide resolved
core/string/ustring.cpp Show resolved Hide resolved
@Rubonnek Rubonnek force-pushed the add-string-overloads branch 3 times, most recently from 94d4199 to 5247f0b Compare November 3, 2023 14:31
@Rubonnek
Copy link
Member Author

Rubonnek commented Nov 3, 2023

I noticed I missed another overload I wanted and took the liberty to add int String::findn(const char *p_str, int p_from) as well, which is the last one I'll add for this PR.

@AThousandShips
Copy link
Member

AThousandShips commented Nov 3, 2023

Let's add some unit tests for these to ensure correctness, do they work with unicode? If so then ensure it works with non-utf8 input

Edit: or rather, you need to rework the tests to test for the String and c-string versions, because the current tests use string literals that are now going through a different method

@Rubonnek Rubonnek marked this pull request as draft November 3, 2023 14:56
@AThousandShips
Copy link
Member

It currently does break current checks, and unsure what happens with various cases using non-ascii/utf8 input

@Rubonnek Rubonnek force-pushed the add-string-overloads branch from 5247f0b to e89d1ff Compare November 3, 2023 15:14
@Rubonnek
Copy link
Member Author

Rubonnek commented Nov 3, 2023

Edit: or rather, you need to rework the tests to test for the String and c-string versions, because the current tests use string literals that are now going through a different method

Indeed. I was about to ask about this -- I'm not sure if we want to duplicate the tests like this:

	CHECK(s.findn("WHA") == 7);
	CHECK(s.findn(String("WHA")) == 7);

Or maybe add a macro to test both under the same line and keep it DRY.

I don't know if there's a cleaner way to do this. There doesn't seem to be.

@Rubonnek Rubonnek force-pushed the add-string-overloads branch from e89d1ff to 764eb07 Compare November 3, 2023 16:47
@Rubonnek Rubonnek marked this pull request as ready for review November 3, 2023 16:49
@Rubonnek Rubonnek requested a review from a team as a code owner November 3, 2023 16:49
@Rubonnek Rubonnek force-pushed the add-string-overloads branch from 764eb07 to 0c2e9b6 Compare November 3, 2023 16:50
@Rubonnek Rubonnek force-pushed the add-string-overloads branch from 5938333 to 0c2e9b6 Compare November 3, 2023 17:02
@Rubonnek Rubonnek force-pushed the add-string-overloads branch from 0c2e9b6 to 5c18746 Compare November 3, 2023 18:03
Copy link
Member

@AThousandShips AThousandShips left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't directly tested, and there is the possibility of code breaking with non-ascii strings passed as literals, but this applies to the already added methods like find, so not a new issue, would be good to confirm that it doesn't break things like: str.findn(U"...")

Otherwise it looks good to me, but TIWAGOS

@Rubonnek Rubonnek changed the title Add C-string overloads to String class Add const char * overloads to String class Nov 3, 2023
@Rubonnek
Copy link
Member Author

Rubonnek commented Nov 3, 2023

would be good to confirm that it doesn't break things like: str.findn(U"...")

I should clarify I intended to keep this PR unrelated to U"" and L"" C-strings. Those calls won't break due to this PR because they are on an unrelated code path. The call str.findn(U"") yields a function signature similar to findn(const char32_t*) for which there is no implementation (this PR only implements findn(const char *)) and as a result the compiler will look for other possible function candidates it can call given the findn(const char32_t*) signature. Currently, there's only one other candidate which is str.findn(String) which the compiler knows how to call because it can convert the const char32_t* parameter to a String through the String copy constructor String::String(const char32_t*). The same happens with a str.findn(L"") function call and, again, this PR never touches either of those code paths.

Having said that, we are indeed missing some tests for U"" and L"" C-strings. I'll add the tests for the functions I've overloaded if they pass.

@AThousandShips
Copy link
Member

AThousandShips commented Nov 3, 2023

Agreed, just wanted to make sure the automatic overload resolution works correctly as C++ can be so messy with that some times

Tests would be great to ensure it (and future changes)

@Rubonnek Rubonnek force-pushed the add-string-overloads branch from 5c18746 to ce145da Compare November 3, 2023 22:05
@Rubonnek
Copy link
Member Author

Rubonnek commented Nov 3, 2023

Tests passed for U"" and L"" C-strings on my end so I've added them.

@Rubonnek Rubonnek force-pushed the add-string-overloads branch 3 times, most recently from 9834c09 to 8542abf Compare November 19, 2023 18:37
Copy link
Member

@AThousandShips AThousandShips left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Realized some missed details

Also needs some fixed for casting

core/string/ustring.cpp Outdated Show resolved Hide resolved
core/string/ustring.cpp Outdated Show resolved Hide resolved
core/string/ustring.cpp Outdated Show resolved Hide resolved
@Rubonnek Rubonnek force-pushed the add-string-overloads branch 2 times, most recently from 164bf62 to f70d92b Compare November 19, 2023 19:26
@akien-mga
Copy link
Member

I personally find that this patch makes the Editor feel snappier, but I have not run any benchmarks yet.

Would be nice to do some benchmarking on an optimized build to see if there is actually a significant difference. Performance optimization can have surprising results only reasoned about but not tested.

CC @Calinou

@Rubonnek Rubonnek force-pushed the add-string-overloads branch from f70d92b to 6ebdf70 Compare January 4, 2024 20:36
@akien-mga akien-mga modified the milestones: 4.x, 4.3 Jan 5, 2024
@Calinou
Copy link
Member

Calinou commented Jan 6, 2024

Would be nice to do some benchmarking on an optimized build to see if there is actually a significant difference. Performance optimization can have surprising results only reasoned about but not tested.

Here are benchmarks of editor startup + shutdown time (I ran these twice and results were consistent):

❯ hyperfine -iw1 "bin/godot.linuxbsd.editor.x86_64.master --path /tmp/4 -e --quit" "bin/godot.linuxbsd.editor.x86_64 --path /tmp/4 -e --quit"
Benchmark 1: bin/godot.linuxbsd.editor.x86_64.master --path /tmp/4 -e --quit
  Time (mean ± σ):      3.027 s ±  0.058 s    [User: 2.075 s, System: 0.361 s]
  Range (min … max):    2.911 s …  3.132 s    10 runs
 
Benchmark 2: bin/godot.linuxbsd.editor.x86_64 --path /tmp/4 -e --quit           <--- This PR
  Time (mean ± σ):      2.982 s ±  0.290 s    [User: 2.074 s, System: 0.366 s]
  Range (min … max):    2.523 s …  3.537 s    10 runs
 
Summary
  bin/godot.linuxbsd.editor.x86_64 --path /tmp/4 -e --quit ran
    1.02 ± 0.10 times faster than bin/godot.linuxbsd.editor.x86_64.master --path /tmp/4 -e --quit

The difference is small, but it's an improvement either way.

As for binary size:

Editor

master: 105,335,696 bytes
    PR: 105,270,128 bytes (-65 KB)

Release export template

master: 62,385,544 bytes
    PR: 62,344,616 bytes (-41 KB)

I think the improvement is worth the added code complexity, as getting binary size down is important.

@akien-mga akien-mga requested a review from bruvzg January 11, 2024 16:17
@Rubonnek Rubonnek force-pushed the add-string-overloads branch from 6ebdf70 to 8e2356c Compare March 11, 2024 15:03
@Rubonnek
Copy link
Member Author

Rebased to incorporate test changes from #89194.

Comment on lines 353 to 389
MULTICHECK_STRING_EQ(s, find, "tty", 3);
MULTICHECK_STRING_INT_EQ(s, find, "Wo", 9, 13);
MULTICHECK_STRING_EQ(s, find, "Revenge of the Monster Truck", -1);
MULTICHECK_STRING_EQ(s, rfind, "man", 15);
}

TEST_CASE("[String] Find no case") {
String s = "Pretty Whale Whale";
CHECK(s.findn("WHA") == 7);
CHECK(s.findn("WHA", 9) == 13);
CHECK(s.findn("Revenge of the Monster SawFish") == -1);
CHECK(s.rfindn("WHA") == 13);
MULTICHECK_STRING_EQ(s, findn, "WHA", 7);
MULTICHECK_STRING_INT_EQ(s, findn, "WHA", 9, 13);
MULTICHECK_STRING_EQ(s, findn, "Revenge of the Monster SawFish", -1);
MULTICHECK_STRING_EQ(s, rfindn, "WHA", 13);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about some more tests for finds. I would also suggest separating find and rfind test to separate test cases (also findn and rfindn).

These tests are still in the old format.

	CHECK(s.find("") == 0);
	CHECK(s.find("", 9) == 9);
	CHECK(s.find("Pretty Woman Woman") == 0);
	CHECK(s.find("WOMAN") == -1);

	CHECK(s.rfind("") == 18); // fails (returns -1)
	CHECK(s.rfind("", 15) == 15); // fails (returns -1)
	CHECK(s.rfind("foo") == -1);
	CHECK(s.rfind("Pretty Woman Woman") == 0);
	CHECK(s.rfind("man", 13) == 9);
	CHECK(s.rfind("WOMAN") == -1);

	CHECK(s.findn("") == 0); // fails (returns -1)
	CHECK(s.findn("", 3) == 3); // fails (returns -1)
	CHECK(s.findn("wha") == 7);
	CHECK(s.findn("Wha") == 7);

	CHECK(s.rfindn("") == 15); // fails (returns -1)
	CHECK(s.rfindn("", 13) == 13); // fails (returns -1)
	CHECK(s.rfindn("wha") == 13);
	CHECK(s.rfindn("Wha") == 13);

There are some some inconsistencies how things work in master and thus some of these tests fail. Finding with "" seems to be a somewhat undefined case. The basic find returns 0 when "" is searched, which to me seems to be the correct answer. There is a "" at every index. But all other functions (rfind, findn, rfindn) return -1 when "" is searched. In my opinion that is wrong, when "" is searched, I would expect the first index checked to be returned.

However, in GDScript both find and rfind return -1 when "" is searched. Although I personally would prefer not returning -1, consistency is more important. If it is decided that "" is never found (-1), current C++ find implementation should be changed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The basic find returns 0 when "" is searched, which to me seems to be the correct answer. There is a "" at every index. But all other functions (rfind, findn, rfindn) return -1 when "" is searched.

That's a bug in the overload String::find(const char *p_str, int p_from). It's documented it should return -1, and the same applies to rfind, findn, and rfindn. The way I see it, "" is by definition an empty string and therefore searching for it implies nothing will be found and therefore the function should return -1.

I've patched this bug in this PR -- find("") now returns -1 to match the behavior of the rest. Thanks for pointing that out.

separate test cases

That's a style code change better suited for a different PR. I consider that change to be out of the scope of this PR.

@Rubonnek Rubonnek force-pushed the add-string-overloads branch from 8e2356c to 72f0d38 Compare March 12, 2024 15:40
@akien-mga
Copy link
Member

Seems good to me. Could you rebase to make sure it still passes CI?

@Rubonnek Rubonnek force-pushed the add-string-overloads branch from 72f0d38 to d4154db Compare May 7, 2024 14:53
@Rubonnek
Copy link
Member Author

Rubonnek commented May 7, 2024

CI checks passed after rebasing.

@akien-mga akien-mga merged commit ed0dbc3 into godotengine:master May 7, 2024
16 checks passed
@akien-mga
Copy link
Member

Thanks!

@Rubonnek Rubonnek deleted the add-string-overloads branch May 7, 2024 21:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants