Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use simpler faster Rabin-Karp-like search for short needle #13820

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

funny-falcon
Copy link
Contributor

No description provided.

@straight-shoota
Copy link
Member

I see some overlap with the implementations of #index(Char)/#byte_index(Char) which both already handle bytesizes smaller than 4. I figure those use cases could be merged with these optimized algorithms?

@HertzDevil
Copy link
Contributor

HertzDevil commented Sep 23, 2023

This and #13819 could definitely share the same internal method, now that that PR is merged. The only caveat is that here the 4 bytes might represent more than 1 character

@funny-falcon
Copy link
Contributor Author

I doubd it is feasible to refactor "fast RK" into separate method:

  • character position is counted in index(String), and it is not needed to be counted in byteindex(String)
  • index(Char) already uses per-char iteration which doesn't use Rabin-Karp. Does it worth to change it to RK?
  • at most byteindex(Char) and byteindex(String) could be unified for short strings, I believe.

@funny-falcon funny-falcon force-pushed the string_simple_rabin_karp branch from d20dbfa to 94f7bc5 Compare October 1, 2023 11:22
@straight-shoota
Copy link
Member

I suppose it's best to merge this as is then. We can try to refactor and deduplicate once it's in.

@straight-shoota
Copy link
Member

Do you have any benchmarks to indicate the performance impact of this change?

Why did you chose UInt32 for the short needle? I figure we could use UInt64 (or UInt128) for search strings with up to 8 (16) bytes? There should be no performance degradation at least for 64-bit arithmetics on 64-bit architectures.

Bigger search strings would need SIMD support though (ref #3057).

@funny-falcon
Copy link
Contributor Author

@straight-shoota good question about integer size.

If 32bit platforms are not first-class targets for Crystal, then I'd like to use UInt64 certainly.

UInt128 doubtfully would be good cause it is not native.

For benchmarks: I used to bench RK in C, and saw impact of double multiplication.
But ok, I will do benchmark for this PR.

@straight-shoota
Copy link
Member

32-bit targets are certainly getting less important. This is an optimization and I expect even with a 64-bit hash performance would still be reasonable on 32-bit architectures. So I think there's no reason not to go for 64-bits.

@funny-falcon
Copy link
Contributor Author

I've pushed refactoring commit, and benchmarked.

I use UInt32 for chars and UInt64 for strings in "fast rabin-karp".

It is not fastest variant: using UInt32 for small strings and UInt64 for 5-8 charracter strings were faster, but then performance for 2-4 byte string and 5-8 byte string were the same. I couldn't explain it, so I just drop UInt32 for 2-4 byte strings.

Really, it was nightmare benchmarking: simple changes could affect result upto 16% without any meaningful reason 😢

Benchmark results on Ryzen 7 5825U at 2GHz with this scripts

was idx now idx diff idx was bidx now bidx diff bidx
small 1 0.068663 0.061373 -10.62 0.071392 0.008054 -88.72
medium 1 0.503822 0.334602 -33.59 0.545443 0.010590 -98.06
big 1 4.697361 2.865704 -38.99 5.058999 0.027877 -99.45
small 2 0.069861 0.053189 -23.86 0.073254 0.034132 -53.41
medium 2 0.509033 0.336184 -33.96 0.548318 0.235357 -57.08
big 2 4.730817 3.117069 -34.11 5.062775 2.049732 -59.51
small 4 0.077225 0.058108 -24.75 0.079227 0.036833 -53.51
medium 4 0.510799 0.340637 -33.31 0.554923 0.242836 -56.24
big 4 4.702089 3.096297 -34.15 5.069510 2.052939 -59.50
small 6 0.078827 0.057426 -27.15 0.080145 0.038325 -52.18
medium 6 0.511529 0.339824 -33.57 0.559667 0.241253 -56.89
big 6 4.700362 3.093604 -34.18 5.076374 2.053248 -59.55
small 10 0.088149 0.077928 -11.60 0.084642 0.070408 -16.82
medium 10 0.517613 0.511492 -1.18 0.568815 0.555601 -2.32
big 10 4.699969 4.695120 -0.10 5.085927 5.074980 -0.22
small 14 0.089366 0.080371 -10.07 0.089447 0.072904 -18.49
medium 14 0.523864 0.511200 -2.42 0.575362 0.556926 -3.20
big 14 4.706731 4.701650 -0.11 5.092217 5.071227 -0.41
small 18 0.095913 0.082964 -13.50 0.098844 0.078447 -20.64
medium 18 0.549657 0.531382 -3.32 0.585257 0.561600 -4.04
big 18 4.744925 4.725939 -0.40 5.100282 5.076517 -0.47

@funny-falcon
Copy link
Contributor Author

UInt128 didn't show difference with full Rabin-Karp, so I don't include it into final variant.

src/string.cr Outdated
Comment on lines 3358 to 3392
private macro gen_index_short(int_class, by_char)
# simplified Rabin-Karp version with multiplier == 256
search_hash = {{int_class}}.new(0)
hash = {{int_class}}.new(0)
mask = {{int_class}}.new(0)

search.each_byte do |b|
search_hash = (search_hash << 8) | b
hash = (hash << 8) | pointer.value
mask = (mask << 8) | 0xff
pointer += 1
end
{% if by_char %}
search_bytesize = search.bytesize
{% end %}

while true
return offset if (hash & mask) == search_hash

{% if by_char %}
char_bytesize = String.char_bytesize_at(pointer - search_bytesize)
{% else %}
char_bytesize = 1
{% end %}
return if pointer + char_bytesize > end_pointer
case char_bytesize
when 1 then update_simplehash 1
when 2 then update_simplehash 2
when 3 then update_simplehash 3
else update_simplehash 4
end

offset &+= 1
end
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this actually need to be a macro?
I'm pretty sure it could be implemented as a def which yields for the current char bytesize (char_bytesize = yield pointer). int_class can be a regular def parameter.
Additional parameters for search, offset, pointer and end_pointer are also needed of course.
The result should be pretty much identical, but the source is a bit mor straightforward and easier to reason about.

Call sites would look like this:

search_bytesize = search.bytesize
index_short(UInt64) { |pointer| String.char_bytesize_at(pointer - search_bytesize) }
index_short(UInt32) { 1 }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

It is a bit slower in byte_index.
But since I can't reason why, I think it is ok.
(I tried to look into objdump -dS, but code produced by --release is purely matched to source)
Let compiler and future contributors improve it.

src/string.cr Outdated Show resolved Hide resolved
@funny-falcon
Copy link
Contributor Author

May I rebase and squash commits?

@HertzDevil
Copy link
Contributor

No, merge as usual

@funny-falcon
Copy link
Contributor Author

@HertzDevil it’s pity: a lot of small commits without much sense. Personally I prefer cleaner history, so I tend to rearrange commits after review before merge. This way it goes at work, and in FOSS projects which lives outside of github.

But, ok. I calm down.

src/string.cr Show resolved Hide resolved
src/string.cr Show resolved Hide resolved
src/string.cr Outdated Show resolved Hide resolved
@funny-falcon
Copy link
Contributor Author

Sorry for long delay.

Merged master into branch. Added commit with review fixes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants