-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HPPC-186: Change the iteration order of HashSet and HashMap. Remove hash order mixing. #10
Conversation
Hi @bruno-roustant ! |
Hi @vsonnier ! |
Hello both (hi @vsonnier , long time no see!). Yes, it would be great to resign from the seed mix. I admit I can't remember whether this was considered at the time - it's been a while. I only remember I also didn't anticipate the original problem and I went the really defensive way of trying to absolutely minimize the chances of this happening in a production system. We do copy/ manipulate large associative arrays and if this every happens it's effectively a live deadlock... If we can somehow prove the iteration order (or skip) prevents this with a really low-likelihood of happening then it'd be a great improvement. Remember those maps can be copied over and over between instances. Separately from theoretical aspects there is also the practical aspect: seed mix adds overhead but it's fairly local and should play well with CPU caches, and even hotspot optimizations (in tight loops). The same with linear iteration - it is really cache-friendly. Adding skips will affect this property. |
I'll take some time to think about how to prove the really low-likehood of collision avalanche. About the seed mix overhead, I saw a consistent and visible perf impact during the numerous benchmarks we did when evaluating worm hashing. For all benchmarks, mixed hppc had visible overhead compared to scatter, due to this slightly more complex hash function, on multiple machines and with two different benchmark implementations (both based on JMH). So I'm convinced that the hotspot optimization does not have a good effect there. At least for jdk 11 and below. Maybe that would be different with C/C++. For the cache-friendliness of the iteration, well I suppose the JMH benchmark will be the arbitrator. Based on the CollisionAvalanche I ran with the sample code, the small skip seems to work well since it provides 33% speed improvement for scatter hppc set compared to standard hppc set. We'll need to run more benchmarks for more sizes to confirm this. Ok, this makes me confident I can continue to work on the subject. That's what I wanted to discuss first. |
I agree the "scatter" version is slightly faster - it is! That's why I left it in the code (for some applications it's useful). The question remains what happens when you don't have full control over how your collections are being used - this is what I'm concerned with. As for JVM/hotspot optimizations and cache influence: it's a really, really tough material to provide insight on since each and every architecture (not just the CPU but also memory controller, etc.) will have varying results... I really miss simple computers sometimes when you could just read clock ticks out of each instruction and just know for certain... |
I have tried a few things. First I quickly verified that the small prime increment (let’s call it skip-iteration-order) is indeed an iteration order: it retrieves all slots and only once. I ran brute force from 2^0 to 2^28. It works also for any starting slot (could be random instead of always 0). Then I instrumented KTypeHashSet to count each time we scan linearly the next slots after a hash collision. I called this collisionScanCount. I ran tests to copy Sets for many source and target sizes, focusing mainly on heavy loaded sets (as in CollisionAvalanche benchmark). Each time I measured the collisionRatio = collisionScanCount / targetSet.size() to estimate the penalty for collisions. For the case of heavy loaded set, I got collisionRatio=0.99 for KTypeHashSet and collisionRatio=0.75 for KTypeScatterSet. This is nearly 25% less collision slots scanned. And this corresponds to the perf gain I see for B004_HashSet_CollisionAvalanche: hppc_scatter is 28% faster. Then I modified B004_HashSet_CollisionAvalanche to have copy cycles: a source set is copied to an empty target, then the target becomes the source and is copied to another target and so on. Same perf gain 28%. Actually when a set is copied, the skip-iteration-order is used to read the source set but once copied the keys are in the target hash (mixPhi) order. So for any copy between two hppc sets the iteration order is always different than the target hash order. That’s why it avoids collisions. I don’t think we need to include randomness to ensure this property. What if a non-hppc collection is copied to a hppc set? The source collection could have an iteration order based on the mixPhi order. This is actually the case with Fastutil and Koloboke collections, they use the same mixPhi. In this case only the current keyMixer approach guarantees the protection (I could imagine a buffered-shuffling iterable wrapper to iterate the keys in modified order, but the caller would have to remind it). I don’t know how much you want to protect this case Dawid. |
Can you file a patch with this, Bruno? Doesn't have to be commitable, I'd just look and give it some thought but it'd be clearer for me to see what you want to do with those prime increments, etc. |
This PR already provides the code for protecting toArray(). |
Hi Bruno. bq. First I quickly verified that the small prime increment (let’s call it skip-iteration-order) is indeed an iteration order: it retrieves all slots and only once. I ran brute force from 2^0 to 2^28. It works also for any starting slot (could be random instead of always 0). I think this one is relatively easy to prove mathematically too - if the "skip" is a prime m and underlying table is of size 2^n then the two are relatively prime to each other so for any sequence of i=0..(2^n-1), i * m mod 2^n is guaranteed to fill any bucket within 0..2^n-1. This also applies if you add an initial "shift" (it merely rotates the initial sequence of i's. The same idea is used in open addressing with double hashing. As for practical repercussions - I think you may be right. I'll have to spend some time on this tinkering with actual possibilities but please do feel free to provide a patch that does implement this idea. |
I pushed the code. Now it is on KTypeHashSet directly instead of KTypeScatterSet. CollisionAvalanche benchmark before the change:B004_HashSet_CollisionAvalanche.run __ HPPC _____________ 0.024 ± 0.002 s/op CollisionAvalanche benchmark after the change:B004_HashSet_CollisionAvalanche.run __ HPPC _____________ 0.024 ± 0.001 s/op |
Looks good to me. I wonder if we should just drop the seed-mixed version entirely and just use this instead. The same would apply to maps too, I guess? I can fold this in and do this, let's see what happens. :) |
Yes, cool! To drop the seed-mixed version is the next step, and apply to map. |
Please do go ahead with this if you have spare cycles! |
…a 20% perf penalty.
Make hash iteration order vary for each KTypeHashSet iteration.
First step: I removed HashOrderMixingStrategy from KTypeHashSet (KTypeScatterSet to be removed). In the process I got a significant test failure with HashCollisionsClusteringTest.testHashSetClusteringAtFront2(). So I fixed this weakness by making the hash iteration order change between iterations. There is an iteration seed, which changes for each iteration start, that we use to determine the iteration slot start (not always 0) and the iteration increment (only needs to be an odd number). That fixed immediately the test (and the cool visualization showed a uniform distribution). CollisionAvalanche benchmark after the change:B004_HashSet_CollisionAvalanche.run __ HPPC _____________ 0.017 ± 0.001 s/op |
The HashCollisionsClusteringTest.testHashSetClusteringAtFront2() test is fixed but the problem is not completely fixed. |
Hi Bruno. Just wanted to say I do follow your updates but haven't had the time to look at the code so far. Good the tests are catching odd cases - I'm sure there are more of these lurking around the corner. :) |
The "format violations" can be automatically corrected - just run ./gradlew spotlessApply and it'll reformat the code. There is a Google format plugin for IntelliJ as well but they're not always in perfect sync (with respect to imports, for example). I am generally happy with the formatting done by google's formatter - it is very convenient with cross-code refactorings and generally leaves the code in much better shape over time. |
I removed ScatterSet and ScatterMap. Actually now HashSet and HashMap are like old scatter versions with a kind of shuffled iteration order. During this change I fixed ObjectIdentityHashSet.hashKey to use System.identityHashCode() (it didn't use it although I think the intent was to use it like in KTypeVTypeIdentityHashMap). |
@@ -46,7 +45,7 @@ public ObjectIdentityHashSet(ObjectContainer<? extends KType> container) { | |||
@Override | |||
protected int hashKey(KType key) { | |||
assert key != null; // Handled as a special case (empty slot marker). | |||
return BitMixer.mixPhi(key); | |||
return BitMixer.mixPhi(System.identityHashCode(key)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh. this looks like a bug in existing code too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected this on master, you may want to merge.
Yep, I saw that. Looks like a legitimate bug to me. |
Finally had a chance to take a closer look. Thanks for all the work, Bruno. This approach looks very good to me indeed. I do have some minor concerns about iteration order of elements from the same container changing from call to call -- whether this will bite somebody is probably not a question of 'if' but 'when... :) This said, for anybody who did use those collections the iteration order between runs would have changed anyway so I don't think it's a major issue. Do you think it's ready to be merged in? If anything pops up, it can be another PR. |
This was my last commit for this PR: to remove now unused HashOrderMixing and BitMixer's mix with seeds methods. |
I'll merge it asap. Thanks Bruno. |
There are 3 changes in hppc that impact OpenSearch: - ObjectIntScatterMap has been removed. I could have replaced it with ObjectIntHashMap, but a plain java.util.HashMap seems to solve the problem well so I used that instead. - `indexRemove(int index)` is a new method in two interfaces that we extend in a delegate pattern, so I have had to add that method as a pass-through in two places. - The seeded version of `BitMixer.mix(int, int)` [has been removed][1]. The implementation that was removed simply xor'd the value to the seed before calling the single-int variant of the method. I have changed the OpenSearch code to do the xor directly, so this should be equivalent to the previous implementation. [1]: carrotsearch/hppc#10 Signed-off-by: Andrew Ross <[email protected]>
There are 3 changes in hppc that impact OpenSearch: - ObjectIntScatterMap has been removed. I could have replaced it with ObjectIntHashMap, but a plain java.util.HashMap seems to solve the problem well so I used that instead. - `indexRemove(int index)` is a new method in two interfaces that we extend in a delegate pattern, so I have had to add that method as a pass-through in two places. - The seeded version of `BitMixer.mix(int, int)` [has been removed][1]. The implementation that was removed simply xor'd the value to the seed before calling the single-int variant of the method. I have changed the OpenSearch code to do the xor directly, so this should be equivalent to the previous implementation. [1]: carrotsearch/hppc#10 Signed-off-by: Andrew Ross <[email protected]>
There are 3 changes in hppc that impact OpenSearch: - ObjectIntScatterMap has been removed. I could have replaced it with ObjectIntHashMap, but a plain java.util.HashMap seems to solve the problem well so I used that instead. - `indexRemove(int index)` is a new method in two interfaces that we extend in a delegate pattern, so I have had to add that method as a pass-through in two places. - The seeded version of `BitMixer.mix(int, int)` [has been removed][1]. The implementation that was removed simply xor'd the value to the seed before calling the single-int variant of the method. I have changed the OpenSearch code to do the xor directly, so this should be equivalent to the previous implementation. [1]: carrotsearch/hppc#10 Signed-off-by: Andrew Ross <[email protected]>
Sample code for https://issues.carrot2.org/browse/HPPC-186