Use simpler faster Rabin-Karp-like search for short needle #13820

funny-falcon · 2023-09-20T00:07:02Z

No description provided.

straight-shoota · 2023-09-20T10:00:52Z

I see some overlap with the implementations of #index(Char)/#byte_index(Char) which both already handle bytesizes smaller than 4. I figure those use cases could be merged with these optimized algorithms?

HertzDevil · 2023-09-23T13:46:51Z

This and #13819 could definitely share the same internal method, now that that PR is merged. The only caveat is that here the 4 bytes might represent more than 1 character

funny-falcon · 2023-10-01T10:40:09Z

I doubd it is feasible to refactor "fast RK" into separate method:

character position is counted in index(String), and it is not needed to be counted in byteindex(String)
index(Char) already uses per-char iteration which doesn't use Rabin-Karp. Does it worth to change it to RK?
at most byteindex(Char) and byteindex(String) could be unified for short strings, I believe.

straight-shoota · 2023-10-05T09:42:15Z

I suppose it's best to merge this as is then. We can try to refactor and deduplicate once it's in.

straight-shoota · 2023-10-05T09:58:52Z

Do you have any benchmarks to indicate the performance impact of this change?

Why did you chose UInt32 for the short needle? I figure we could use UInt64 (or UInt128) for search strings with up to 8 (16) bytes? There should be no performance degradation at least for 64-bit arithmetics on 64-bit architectures.

Bigger search strings would need SIMD support though (ref #3057).

funny-falcon · 2023-10-06T06:32:12Z

@straight-shoota good question about integer size.

If 32bit platforms are not first-class targets for Crystal, then I'd like to use UInt64 certainly.

UInt128 doubtfully would be good cause it is not native.

For benchmarks: I used to bench RK in C, and saw impact of double multiplication.
But ok, I will do benchmark for this PR.

straight-shoota · 2023-10-06T08:19:12Z

32-bit targets are certainly getting less important. This is an optimization and I expect even with a 64-bit hash performance would still be reasonable on 32-bit architectures. So I think there's no reason not to go for 64-bits.

funny-falcon · 2023-10-08T14:52:33Z

I've pushed refactoring commit, and benchmarked.

I use UInt32 for chars and UInt64 for strings in "fast rabin-karp".

It is not fastest variant: using UInt32 for small strings and UInt64 for 5-8 charracter strings were faster, but then performance for 2-4 byte string and 5-8 byte string were the same. I couldn't explain it, so I just drop UInt32 for 2-4 byte strings.

Really, it was nightmare benchmarking: simple changes could affect result upto 16% without any meaningful reason 😢

Benchmark results on Ryzen 7 5825U at 2GHz with this scripts

	was idx	now idx	diff idx	was bidx	now bidx	diff bidx
small 1	0.068663	0.061373	-10.62	0.071392	0.008054	-88.72
medium 1	0.503822	0.334602	-33.59	0.545443	0.010590	-98.06
big 1	4.697361	2.865704	-38.99	5.058999	0.027877	-99.45
small 2	0.069861	0.053189	-23.86	0.073254	0.034132	-53.41
medium 2	0.509033	0.336184	-33.96	0.548318	0.235357	-57.08
big 2	4.730817	3.117069	-34.11	5.062775	2.049732	-59.51
small 4	0.077225	0.058108	-24.75	0.079227	0.036833	-53.51
medium 4	0.510799	0.340637	-33.31	0.554923	0.242836	-56.24
big 4	4.702089	3.096297	-34.15	5.069510	2.052939	-59.50
small 6	0.078827	0.057426	-27.15	0.080145	0.038325	-52.18
medium 6	0.511529	0.339824	-33.57	0.559667	0.241253	-56.89
big 6	4.700362	3.093604	-34.18	5.076374	2.053248	-59.55
small 10	0.088149	0.077928	-11.60	0.084642	0.070408	-16.82
medium 10	0.517613	0.511492	-1.18	0.568815	0.555601	-2.32
big 10	4.699969	4.695120	-0.10	5.085927	5.074980	-0.22
small 14	0.089366	0.080371	-10.07	0.089447	0.072904	-18.49
medium 14	0.523864	0.511200	-2.42	0.575362	0.556926	-3.20
big 14	4.706731	4.701650	-0.11	5.092217	5.071227	-0.41
small 18	0.095913	0.082964	-13.50	0.098844	0.078447	-20.64
medium 18	0.549657	0.531382	-3.32	0.585257	0.561600	-4.04
big 18	4.744925	4.725939	-0.40	5.100282	5.076517	-0.47

funny-falcon · 2023-10-08T14:54:30Z

UInt128 didn't show difference with full Rabin-Karp, so I don't include it into final variant.

straight-shoota · 2023-10-09T08:48:22Z

src/string.cr

+  private macro gen_index_short(int_class, by_char)
+    # simplified Rabin-Karp version with multiplier == 256
+    search_hash = {{int_class}}.new(0)
+    hash = {{int_class}}.new(0)
+    mask = {{int_class}}.new(0)
+
+    search.each_byte do |b|
+      search_hash = (search_hash << 8) | b
+      hash = (hash << 8) | pointer.value
+      mask = (mask << 8) | 0xff
+      pointer += 1
+    end
+    {% if by_char %}
+    search_bytesize = search.bytesize
+    {% end %}
+
+    while true
+      return offset if (hash & mask) == search_hash
+
+      {% if by_char %}
+      char_bytesize = String.char_bytesize_at(pointer - search_bytesize)
+      {% else %}
+      char_bytesize = 1
+      {% end %}
+      return if pointer + char_bytesize > end_pointer
+      case char_bytesize
+      when 1 then update_simplehash 1
+      when 2 then update_simplehash 2
+      when 3 then update_simplehash 3
+      else        update_simplehash 4
+      end
+
+      offset &+= 1
+    end
+  end


Does this actually need to be a macro?
I'm pretty sure it could be implemented as a def which yields for the current char bytesize (char_bytesize = yield pointer). int_class can be a regular def parameter.
Additional parameters for search, offset, pointer and end_pointer are also needed of course.
The result should be pretty much identical, but the source is a bit mor straightforward and easier to reason about.

Call sites would look like this:

search_bytesize = search.bytesize index_short(UInt64) { |pointer| String.char_bytesize_at(pointer - search_bytesize) }

index_short(UInt32) { 1 }

Done.

It is a bit slower in byte_index.
But since I can't reason why, I think it is ok.
(I tried to look into objdump -dS, but code produced by --release is purely matched to source)
Let compiler and future contributors improve it.

src/string.cr

Co-authored-by: Johannes Müller <[email protected]>

funny-falcon · 2023-10-14T07:11:56Z

May I rebase and squash commits?

HertzDevil · 2023-10-14T07:13:12Z

No, merge as usual

funny-falcon · 2023-10-14T08:18:25Z

@HertzDevil it’s pity: a lot of small commits without much sense. Personally I prefer cleaner history, so I tend to rearrange commits after review before merge. This way it goes at work, and in FOSS projects which lives outside of github.

But, ok. I calm down.

src/string.cr

…_karp

funny-falcon · 2024-04-18T16:18:46Z

Sorry for long delay.

Merged master into branch. Added commit with review fixes.

funny-falcon mentioned this pull request Sep 20, 2023

Add String and Char patterns to StringScanner #13806

Merged

Blacksmoke16 added performance kind:refactor topic:stdlib:text labels Sep 20, 2023

funny-falcon force-pushed the string_simple_rabin_karp branch from 86e168c to d20dbfa Compare September 20, 2023 06:54

Use simpler faster Rabin-Karp-like search for short needle

94f7bc5

funny-falcon force-pushed the string_simple_rabin_karp branch from d20dbfa to 94f7bc5 Compare October 1, 2023 11:22

refactor simple rabin-karp

a63580e

straight-shoota reviewed Oct 9, 2023

View reviewed changes

funny-falcon and others added 3 commits October 9, 2023 13:05

To preserve API, byte_index(char: c) should be accepted.

8651808

Co-authored-by: Johannes Müller <[email protected]>

use function with type parameter and block instead of macros

f4b138e

fix format

0e098c6

funny-falcon requested a review from straight-shoota October 11, 2023 15:14

straight-shoota approved these changes Nov 3, 2023

View reviewed changes

HertzDevil reviewed Dec 12, 2023

View reviewed changes

src/string.cr Show resolved Hide resolved

hugopl suggested changes Feb 14, 2024

View reviewed changes

src/string.cr Show resolved Hide resolved

src/string.cr Show resolved Hide resolved

src/string.cr Outdated Show resolved Hide resolved

funny-falcon added 2 commits April 18, 2024 18:55

Merge remote-tracking branch 'origin/master' into string_simple_rabin…

962a4b1

…_karp

string.cr: couple of review fixes in rabin-karp optimizations

c7faf09

Merge branch 'master' into string_simple_rabin_karp

f710ae4

straight-shoota approved these changes Apr 24, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use simpler faster Rabin-Karp-like search for short needle #13820

Use simpler faster Rabin-Karp-like search for short needle #13820

funny-falcon commented Sep 20, 2023

straight-shoota commented Sep 20, 2023

HertzDevil commented Sep 23, 2023 •

edited

Loading

funny-falcon commented Oct 1, 2023

straight-shoota commented Oct 5, 2023

straight-shoota commented Oct 5, 2023

funny-falcon commented Oct 6, 2023

straight-shoota commented Oct 6, 2023

funny-falcon commented Oct 8, 2023

funny-falcon commented Oct 8, 2023

straight-shoota Oct 9, 2023

funny-falcon Oct 9, 2023

funny-falcon Oct 9, 2023

funny-falcon commented Oct 14, 2023

HertzDevil commented Oct 14, 2023

funny-falcon commented Oct 14, 2023

funny-falcon commented Apr 18, 2024

Use simpler faster Rabin-Karp-like search for short needle #13820

Are you sure you want to change the base?

Use simpler faster Rabin-Karp-like search for short needle #13820

Conversation

funny-falcon commented Sep 20, 2023

straight-shoota commented Sep 20, 2023

HertzDevil commented Sep 23, 2023 • edited Loading

funny-falcon commented Oct 1, 2023

straight-shoota commented Oct 5, 2023

straight-shoota commented Oct 5, 2023

funny-falcon commented Oct 6, 2023

straight-shoota commented Oct 6, 2023

funny-falcon commented Oct 8, 2023

funny-falcon commented Oct 8, 2023

straight-shoota Oct 9, 2023

Choose a reason for hiding this comment

funny-falcon Oct 9, 2023

Choose a reason for hiding this comment

funny-falcon Oct 9, 2023

Choose a reason for hiding this comment

funny-falcon commented Oct 14, 2023

HertzDevil commented Oct 14, 2023

funny-falcon commented Oct 14, 2023

funny-falcon commented Apr 18, 2024

HertzDevil commented Sep 23, 2023 •

edited

Loading