-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use Crystal::Hasher and openaddressing in StringPool #5000
use Crystal::Hasher and openaddressing in StringPool #5000
Conversation
Why not just change I probably missed the original discussion where everything else had to change. |
src/string_pool.cr
Outdated
@values = Pointer(String).malloc(@capacity, "") | ||
@size = 0 | ||
|
||
0.upto(old_capacity - 1) do |i| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
old_capacity.times do |i|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
follows you.
34286ae
to
605032c
Compare
@asterite Just simple bench: require "benchmark"
require "./new_string_pool"
require "./old_string_pool"
class Pools
class_property new_pool = NewStringPool.new
class_property old_pool = OldStringPool.new
end
Benchmark.ips do |x|
x.report("insert new string pool") { test_with(Pools.new_pool) }
x.report("insert old string pool") { test_with(Pools.old_pool) }
end
Benchmark.ips do |x|
x.report("get new string pool") { test_with(Pools.new_pool) }
x.report("get old string pool") { test_with(Pools.old_pool) }
end
def test_with(pool)
1_000_000.times do |num|
pool.get("hello #{num}")
end
end ➜ bench_string_pool.cr git:(master) ✗ ../crystal/bin/crystal src/bench.cr --release
Using compiled compiler at `.build/crystal'
insert new string pool 4.96 (201.65ms) (± 4.36%) fastest
insert old string pool 2.98 (335.58ms) (± 2.34%) 1.66× slower
get new string pool 4.93 (202.81ms) (± 4.25%) fastest
get old string pool 2.97 (336.19ms) (± 2.50%) 1.66× slower Take a note that OldStringPool uses private def hash(str, len)
hasher = Crystal::Hasher.new
hasher = str.to_slice(len).hash(hasher)
hasher.result
end
|
src/string_pool.cr
Outdated
if entry | ||
return entry | ||
mask = (@capacity - 1).to_u32 | ||
index, d = hash & mask, 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you split this into two assignments? Otherwise it looks a bit cryptic.
Also, could you explain here why & mask
is needed? (I know, because later we see the comment about this, but maybe the algorithm can be explained somewhere?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, will split.
Mask is just one of methods to get index to probe between 0 and capacity-1. It's fast and OK.
I like this but I don't quite understand the algorithm. Is it using Quadratic probing? We can merge this, but a detailed explanation of the algorithm must be attached somewhere in the file. I'm pretty impressed by the performance! |
f9ed2a2
to
817a42a
Compare
@asterite it's just linear probing afair (was renamed some variables to show it). @funny-falcon is author, can describe it better, but it's simple. I don't know what to describe. It just choose index and again and again :) |
:-D But why incrementing |
I understood you and will describe this (later, i'm going asleep) :) In short - specified part of hash table always empty, see rehash etc. So linear probing will always take empty element (it is not +1 everytime (but +N), but it is guaranteed). |
For now, will replace with trivial linear probing :) |
It is called "quadratic probing"
It suffers less from secondary clustering (compared to linear probing). And still provides good cache locality, because first several probes are nearby. Google Dense Hash uses same quadratic probing. @akzhan why did you change to linear probing? Do you doubt my sanity? |
I was wrong (will return this) |
817a42a
to
be20e47
Compare
Was rolled back and described quadratic probing implementation details. |
30e7bb6
to
2f26d2e
Compare
Good catch on size recalculation on rehash! I wanted to argue about 64bit hash, but decided not to. |
Crystal::Hasher returns UInt64 hash values. Pointer.[] also operates with 64bit now. |
@asterite ready to review :) |
I'm still looking for a proof that quadratic probing where the capacity if a power of two hits all buckets. Anyone can find one? Maybe a proof can be done by induction, though I haven't done that in a long time now... |
@asterite something like Learn Hash Table the Hard Way -- Part 1: Probe Distributions and others on the series such as Writing a Damn Fast Hash Table With Tiny Memory Footprints? Also Robin Hood Hashing should be your default Hash Table implementation is a great piece. |
http://goog-sparsehash.sourceforge.net/doc/implementation.html
https://fgiesen.wordpress.com/2015/02/22/triangular-numbers-mod-2n/
|
2f26d2e
to
8e717b9
Compare
Explanation related to SparseHash has been added. |
I wouldn't link to SparseHash because this is not SparseHash :-) Maybe link to https://fgiesen.wordpress.com/2015/02/22/triangular-numbers-mod-2n/ But all these comments are private to the implementation, not public documentation. Nobody needs to know how |
…ingPool cause StringPool is used in json decoding, it is important to have it safe.
8e717b9
to
bf016e7
Compare
@asterite Done |
Glad to see this part merged, and will step further! Thanks to @funny-falcon for its great work! |
Indeed, thank you @funny-falcon and @akzhan 🙇 |
Nice 5k get, BTW! 👏 |
cause StringPool is used in json decoding, it is important to have it safe.
I suppose that
Crystal::Hasher
should be safe somedays.Openaddressing also may increase performance a bit (#4675 (comment)).
Extracted from #4675.
/cc @RX14