-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use simpler faster Rabin-Karp-like search for short needle #13820
base: master
Are you sure you want to change the base?
Use simpler faster Rabin-Karp-like search for short needle #13820
Conversation
86e168c
to
d20dbfa
Compare
I see some overlap with the implementations of |
This and #13819 could definitely share the same internal method, now that that PR is merged. The only caveat is that here the 4 bytes might represent more than 1 character |
I doubd it is feasible to refactor "fast RK" into separate method:
|
d20dbfa
to
94f7bc5
Compare
I suppose it's best to merge this as is then. We can try to refactor and deduplicate once it's in. |
Do you have any benchmarks to indicate the performance impact of this change? Why did you chose Bigger search strings would need SIMD support though (ref #3057). |
@straight-shoota good question about integer size. If 32bit platforms are not first-class targets for Crystal, then I'd like to use UInt64 certainly. UInt128 doubtfully would be good cause it is not native. For benchmarks: I used to bench RK in C, and saw impact of double multiplication. |
32-bit targets are certainly getting less important. This is an optimization and I expect even with a 64-bit hash performance would still be reasonable on 32-bit architectures. So I think there's no reason not to go for 64-bits. |
I've pushed refactoring commit, and benchmarked. I use UInt32 for chars and UInt64 for strings in "fast rabin-karp". It is not fastest variant: using UInt32 for small strings and UInt64 for 5-8 charracter strings were faster, but then performance for 2-4 byte string and 5-8 byte string were the same. I couldn't explain it, so I just drop UInt32 for 2-4 byte strings. Really, it was nightmare benchmarking: simple changes could affect result upto 16% without any meaningful reason 😢 Benchmark results on Ryzen 7 5825U at 2GHz with this scripts
|
UInt128 didn't show difference with full Rabin-Karp, so I don't include it into final variant. |
src/string.cr
Outdated
private macro gen_index_short(int_class, by_char) | ||
# simplified Rabin-Karp version with multiplier == 256 | ||
search_hash = {{int_class}}.new(0) | ||
hash = {{int_class}}.new(0) | ||
mask = {{int_class}}.new(0) | ||
|
||
search.each_byte do |b| | ||
search_hash = (search_hash << 8) | b | ||
hash = (hash << 8) | pointer.value | ||
mask = (mask << 8) | 0xff | ||
pointer += 1 | ||
end | ||
{% if by_char %} | ||
search_bytesize = search.bytesize | ||
{% end %} | ||
|
||
while true | ||
return offset if (hash & mask) == search_hash | ||
|
||
{% if by_char %} | ||
char_bytesize = String.char_bytesize_at(pointer - search_bytesize) | ||
{% else %} | ||
char_bytesize = 1 | ||
{% end %} | ||
return if pointer + char_bytesize > end_pointer | ||
case char_bytesize | ||
when 1 then update_simplehash 1 | ||
when 2 then update_simplehash 2 | ||
when 3 then update_simplehash 3 | ||
else update_simplehash 4 | ||
end | ||
|
||
offset &+= 1 | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this actually need to be a macro?
I'm pretty sure it could be implemented as a def which yields for the current char bytesize (char_bytesize = yield pointer
). int_class
can be a regular def parameter.
Additional parameters for search
, offset
, pointer
and end_pointer
are also needed of course.
The result should be pretty much identical, but the source is a bit mor straightforward and easier to reason about.
Call sites would look like this:
search_bytesize = search.bytesize
index_short(UInt64) { |pointer| String.char_bytesize_at(pointer - search_bytesize) }
index_short(UInt32) { 1 }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
It is a bit slower in byte_index.
But since I can't reason why, I think it is ok.
(I tried to look into objdump -dS
, but code produced by --release
is purely matched to source)
Let compiler and future contributors improve it.
May I rebase and squash commits? |
No, merge as usual |
@HertzDevil it’s pity: a lot of small commits without much sense. Personally I prefer cleaner history, so I tend to rearrange commits after review before merge. This way it goes at work, and in FOSS projects which lives outside of github. But, ok. I calm down. |
Sorry for long delay. Merged master into branch. Added commit with review fixes. |
No description provided.