Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPPC-186: Change the iteration order of HashSet and HashMap. Remove hash order mixing. #10

Merged
merged 10 commits into from
Aug 21, 2020

Conversation

bruno-roustant
Copy link
Collaborator

@vsonnier
Copy link

Hi @bruno-roustant !
Please note that HPPCRT-18 was sperseded by : vsonnier#38. In short, I still iterate "in reverse" in HPPC-RT but I also kept the "perturbation" policy in the end.

@bruno-roustant
Copy link
Collaborator Author

Hi @vsonnier !
Interesting, thanks for sharing.
Here the approach with a slot increment would pass. I think it is worth reviving this discussion again :)

@dweiss
Copy link
Member

dweiss commented Aug 14, 2020

Hello both (hi @vsonnier , long time no see!). Yes, it would be great to resign from the seed mix. I admit I can't remember whether this was considered at the time - it's been a while. I only remember I also didn't anticipate the original problem and I went the really defensive way of trying to absolutely minimize the chances of this happening in a production system. We do copy/ manipulate large associative arrays and if this every happens it's effectively a live deadlock...

If we can somehow prove the iteration order (or skip) prevents this with a really low-likelihood of happening then it'd be a great improvement. Remember those maps can be copied over and over between instances.

Separately from theoretical aspects there is also the practical aspect: seed mix adds overhead but it's fairly local and should play well with CPU caches, and even hotspot optimizations (in tight loops). The same with linear iteration - it is really cache-friendly. Adding skips will affect this property.

@bruno-roustant
Copy link
Collaborator Author

I'll take some time to think about how to prove the really low-likehood of collision avalanche.

About the seed mix overhead, I saw a consistent and visible perf impact during the numerous benchmarks we did when evaluating worm hashing. For all benchmarks, mixed hppc had visible overhead compared to scatter, due to this slightly more complex hash function, on multiple machines and with two different benchmark implementations (both based on JMH). So I'm convinced that the hotspot optimization does not have a good effect there. At least for jdk 11 and below. Maybe that would be different with C/C++.

For the cache-friendliness of the iteration, well I suppose the JMH benchmark will be the arbitrator. Based on the CollisionAvalanche I ran with the sample code, the small skip seems to work well since it provides 33% speed improvement for scatter hppc set compared to standard hppc set. We'll need to run more benchmarks for more sizes to confirm this.

Ok, this makes me confident I can continue to work on the subject. That's what I wanted to discuss first.

@dweiss
Copy link
Member

dweiss commented Aug 14, 2020

I agree the "scatter" version is slightly faster - it is! That's why I left it in the code (for some applications it's useful). The question remains what happens when you don't have full control over how your collections are being used - this is what I'm concerned with.

As for JVM/hotspot optimizations and cache influence: it's a really, really tough material to provide insight on since each and every architecture (not just the CPU but also memory controller, etc.) will have varying results... I really miss simple computers sometimes when you could just read clock ticks out of each instruction and just know for certain...

@bruno-roustant
Copy link
Collaborator Author

bruno-roustant commented Aug 17, 2020

I have tried a few things.

First I quickly verified that the small prime increment (let’s call it skip-iteration-order) is indeed an iteration order: it retrieves all slots and only once. I ran brute force from 2^0 to 2^28. It works also for any starting slot (could be random instead of always 0).

Then I instrumented KTypeHashSet to count each time we scan linearly the next slots after a hash collision. I called this collisionScanCount. I ran tests to copy Sets for many source and target sizes, focusing mainly on heavy loaded sets (as in CollisionAvalanche benchmark). Each time I measured the collisionRatio = collisionScanCount / targetSet.size() to estimate the penalty for collisions. For the case of heavy loaded set, I got collisionRatio=0.99 for KTypeHashSet and collisionRatio=0.75 for KTypeScatterSet. This is nearly 25% less collision slots scanned. And this corresponds to the perf gain I see for B004_HashSet_CollisionAvalanche: hppc_scatter is 28% faster.

Then I modified B004_HashSet_CollisionAvalanche to have copy cycles: a source set is copied to an empty target, then the target becomes the source and is copied to another target and so on. Same perf gain 28%.

Actually when a set is copied, the skip-iteration-order is used to read the source set but once copied the keys are in the target hash (mixPhi) order. So for any copy between two hppc sets the iteration order is always different than the target hash order. That’s why it avoids collisions. I don’t think we need to include randomness to ensure this property.
So this could protect forEach(), iterator(), toArray().

What if a non-hppc collection is copied to a hppc set? The source collection could have an iteration order based on the mixPhi order. This is actually the case with Fastutil and Koloboke collections, they use the same mixPhi. In this case only the current keyMixer approach guarantees the protection (I could imagine a buffered-shuffling iterable wrapper to iterate the keys in modified order, but the caller would have to remind it). I don’t know how much you want to protect this case Dawid.

@dweiss
Copy link
Member

dweiss commented Aug 17, 2020

Can you file a patch with this, Bruno? Doesn't have to be commitable, I'd just look and give it some thought but it'd be clearer for me to see what you want to do with those prime increments, etc.

@bruno-roustant
Copy link
Collaborator Author

bruno-roustant commented Aug 17, 2020

This PR already provides the code for protecting toArray().
I'll add the iteration protection for forEach() and iterator().

@dweiss
Copy link
Member

dweiss commented Aug 18, 2020

Hi Bruno.

bq. First I quickly verified that the small prime increment (let’s call it skip-iteration-order) is indeed an iteration order: it retrieves all slots and only once. I ran brute force from 2^0 to 2^28. It works also for any starting slot (could be random instead of always 0).

I think this one is relatively easy to prove mathematically too - if the "skip" is a prime m and underlying table is of size 2^n then the two are relatively prime to each other so for any sequence of i=0..(2^n-1), i * m mod 2^n is guaranteed to fill any bucket within 0..2^n-1. This also applies if you add an initial "shift" (it merely rotates the initial sequence of i's. The same idea is used in open addressing with double hashing.

As for practical repercussions - I think you may be right. I'll have to spend some time on this tinkering with actual possibilities but please do feel free to provide a patch that does implement this idea.

@bruno-roustant
Copy link
Collaborator Author

I pushed the code. Now it is on KTypeHashSet directly instead of KTypeScatterSet.
The idea is that each time we make a copy of a HashSet, the iteration order is kind of "shuffled" to ensure it is not the same as the hash order in the target set.

CollisionAvalanche benchmark before the change:

B004_HashSet_CollisionAvalanche.run __ HPPC _____________ 0.024 ± 0.002 s/op
B004_HashSet_CollisionAvalanche.run __ HPPC_SCATTER __ 1.062 ± 0.059 s/op
B004_HashSet_CollisionAvalanche.run __ FASTUTIL _________ 1.084 ± 0.087 s/op
B004_HashSet_CollisionAvalanche.run __ KOLOBOKE ________ 1.519 ± 0.036 s/op

CollisionAvalanche benchmark after the change:

B004_HashSet_CollisionAvalanche.run __ HPPC _____________ 0.024 ± 0.001 s/op
B004_HashSet_CollisionAvalanche.run __ HPPC_SCATTER __ 0.016 ± 0.001 s/op
B004_HashSet_CollisionAvalanche.run __ FASTUTIL ________ 1.058 ± 0.020 s/op
B004_HashSet_CollisionAvalanche.run __ KOLOBOKE _______ 1.529 ± 0.081 s/op

@dweiss
Copy link
Member

dweiss commented Aug 18, 2020

Looks good to me. I wonder if we should just drop the seed-mixed version entirely and just use this instead. The same would apply to maps too, I guess? I can fold this in and do this, let's see what happens. :)

@bruno-roustant
Copy link
Collaborator Author

Yes, cool! To drop the seed-mixed version is the next step, and apply to map.
I'm ready to help if you want. I could push the change for the map. I could also help to drop stuff.

@dweiss
Copy link
Member

dweiss commented Aug 18, 2020

Please do go ahead with this if you have spare cycles!

@bruno-roustant
Copy link
Collaborator Author

bruno-roustant commented Aug 19, 2020

First step: I removed HashOrderMixingStrategy from KTypeHashSet (KTypeScatterSet to be removed).

In the process I got a significant test failure with HashCollisionsClusteringTest.testHashSetClusteringAtFront2().
Awesome tests!
It caught a naughty weakness of the hash order with increment: when we repeatedly iterate the keys of sets A1,A2,...,An (in the test the top 5K keys) and add them to the same set B, we actually follow the same hash iteration order for the sets A, so the distribution in B is stacking at the beginning of the array. With the very nice visualizeKeyDistribution() in this test I could confirm that. It provoked collision avalanches.

So I fixed this weakness by making the hash iteration order change between iterations. There is an iteration seed, which changes for each iteration start, that we use to determine the iteration slot start (not always 0) and the iteration increment (only needs to be an odd number). That fixed immediately the test (and the cool visualization showed a uniform distribution).
This works with multi-threads without synchronization.

CollisionAvalanche benchmark after the change:

B004_HashSet_CollisionAvalanche.run __ HPPC _____________ 0.017 ± 0.001 s/op
B004_HashSet_CollisionAvalanche.run __ HPPC_SCATTER __ 0.017 ± 0.001 s/op

@bruno-roustant
Copy link
Collaborator Author

The HashCollisionsClusteringTest.testHashSetClusteringAtFront2() test is fixed but the problem is not completely fixed.
The test reused the same A by clearing it and refilling it. So the iteration seed changed.
But if the test was creating new A instances each time, it would currently have the same initial seed.
I'll fix that in the next commit.

@dweiss
Copy link
Member

dweiss commented Aug 19, 2020

Hi Bruno. Just wanted to say I do follow your updates but haven't had the time to look at the code so far. Good the tests are catching odd cases - I'm sure there are more of these lurking around the corner. :)

@dweiss
Copy link
Member

dweiss commented Aug 19, 2020

The "format violations" can be automatically corrected - just run ./gradlew spotlessApply and it'll reformat the code. There is a Google format plugin for IntelliJ as well but they're not always in perfect sync (with respect to imports, for example). I am generally happy with the formatting done by google's formatter - it is very convenient with cross-code refactorings and generally leaves the code in much better shape over time.

@bruno-roustant bruno-roustant changed the title HPPC-186: Sample to change the iteration order of KTypeScatterSet.toArray() HPPC-186: Change the iteration order of HashSet and HashMap. Remove hash order mixing. Aug 20, 2020
@bruno-roustant
Copy link
Collaborator Author

I removed ScatterSet and ScatterMap. Actually now HashSet and HashMap are like old scatter versions with a kind of shuffled iteration order.
Tests are passing and the speed is good.
Next step will be to remove HashOrderMixingStrategy and associated classes.

During this change I fixed ObjectIdentityHashSet.hashKey to use System.identityHashCode() (it didn't use it although I think the intent was to use it like in KTypeVTypeIdentityHashMap).

@@ -46,7 +45,7 @@ public ObjectIdentityHashSet(ObjectContainer<? extends KType> container) {
@Override
protected int hashKey(KType key) {
assert key != null; // Handled as a special case (empty slot marker).
return BitMixer.mixPhi(key);
return BitMixer.mixPhi(System.identityHashCode(key));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh. this looks like a bug in existing code too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected this on master, you may want to merge.

@dweiss
Copy link
Member

dweiss commented Aug 20, 2020

Yep, I saw that. Looks like a legitimate bug to me.

@dweiss
Copy link
Member

dweiss commented Aug 21, 2020

Finally had a chance to take a closer look. Thanks for all the work, Bruno. This approach looks very good to me indeed. I do have some minor concerns about iteration order of elements from the same container changing from call to call -- whether this will bite somebody is probably not a question of 'if' but 'when... :)

This said, for anybody who did use those collections the iteration order between runs would have changed anyway so I don't think it's a major issue.

Do you think it's ready to be merged in? If anything pops up, it can be another PR.

@bruno-roustant
Copy link
Collaborator Author

bruno-roustant commented Aug 21, 2020

This was my last commit for this PR: to remove now unused HashOrderMixing and BitMixer's mix with seeds methods.
I tried to follow the code style. But sure there must be stuff you want to adapt/modify, please do so.
Note that I removed an example class and a section in overview.html about hash vs scatter. Maybe you need to restore some text/example, or add some comment about the iteration order change.
On my side it's ready (and now I'll have less spare cycles :).

@dweiss
Copy link
Member

dweiss commented Aug 21, 2020

I'll merge it asap. Thanks Bruno.

@dweiss dweiss merged commit eaba889 into carrotsearch:master Aug 21, 2020
andrross added a commit to andrross/OpenSearch that referenced this pull request May 10, 2022
There are 3 changes in hppc that impact OpenSearch:

- ObjectIntScatterMap has been removed. I could have replaced it with
ObjectIntHashMap, but a plain java.util.HashMap seems to solve the
problem well so I used that instead.

- `indexRemove(int index)` is a new method in two interfaces that we
extend in a delegate pattern, so I have had to add that method as a
pass-through in two places.

- The seeded version of `BitMixer.mix(int, int)` [has been removed][1].
The implementation that was removed simply xor'd the value to the seed
before calling the single-int variant of the method. I have changed the
OpenSearch code to do the xor directly, so this should be equivalent to
the previous implementation.

[1]: carrotsearch/hppc#10

Signed-off-by: Andrew Ross <[email protected]>
andrross added a commit to opensearch-project/OpenSearch that referenced this pull request May 10, 2022
There are 3 changes in hppc that impact OpenSearch:

- ObjectIntScatterMap has been removed. I could have replaced it with
ObjectIntHashMap, but a plain java.util.HashMap seems to solve the
problem well so I used that instead.

- `indexRemove(int index)` is a new method in two interfaces that we
extend in a delegate pattern, so I have had to add that method as a
pass-through in two places.

- The seeded version of `BitMixer.mix(int, int)` [has been removed][1].
The implementation that was removed simply xor'd the value to the seed
before calling the single-int variant of the method. I have changed the
OpenSearch code to do the xor directly, so this should be equivalent to
the previous implementation.

[1]: carrotsearch/hppc#10

Signed-off-by: Andrew Ross <[email protected]>
andrross added a commit to opensearch-project/OpenSearch that referenced this pull request May 10, 2022
There are 3 changes in hppc that impact OpenSearch:

- ObjectIntScatterMap has been removed. I could have replaced it with
ObjectIntHashMap, but a plain java.util.HashMap seems to solve the
problem well so I used that instead.

- `indexRemove(int index)` is a new method in two interfaces that we
extend in a delegate pattern, so I have had to add that method as a
pass-through in two places.

- The seeded version of `BitMixer.mix(int, int)` [has been removed][1].
The implementation that was removed simply xor'd the value to the seed
before calling the single-int variant of the method. I have changed the
OpenSearch code to do the xor directly, so this should be equivalent to
the previous implementation.

[1]: carrotsearch/hppc#10

Signed-off-by: Andrew Ross <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants