Hash table design #1
Replies: 14 comments 65 replies
-
I'd say the performance gold standard is currently boost unordered_flat_map |
Beta Was this translation helpful? Give feedback.
-
Btw, I saw that you use my older robin_hood map in the benchmark. Could you switch to unordered_dense https://github.com/martinus/unordered_dense instead? It's newer with faster iteration. |
Beta Was this translation helpful? Give feedback.
-
Hi Jackson, thanks for your effort and for your commitment to open innovation! Happy to be part of the discussion. I couldn't find the benchmark code anywhere, maybe you can publish it? |
Beta Was this translation helpful? Give feedback.
-
Just a quick addition: Reddit seems to be hiding the comment I made actually introducing the hash table in that thread. Most of the information in the hidden comment is repeated here or in the repo's README, but for the sake of completeness, I've recreated it on Pastebin. |
Beta Was this translation helpful? Give feedback.
-
Hi, looks great! I ran it against my hash map fuzzer and didn't find any problems. It's basically structured like your test against stl, but using the fuzzers input instead of rand for "randomness". |
Beta Was this translation helpful? Give feedback.
-
Hi Jackson and thanks for the invitation. I have included Verstable into M*LIB benchmark |
Beta Was this translation helpful? Give feedback.
-
Hi guys. Sorry for the belated follow-up. @martinus I have completely rewritten my benchmarks to rely on templates and C++20 concepts, rather than preprocessors macros, as the modularity/extendibility mechanism. I’ve created a private repo and invited everyone listed above to access it. I’ll be gradually adding other C hash table libraries and refining the code and documentation. Eventually, I’ll publicize the repo and publish a write-up. I also added khash (Should I instead include khashl?). I’d also like share several new results:
A few substantive changes to the benchmarks:
I’ve added a pull request to your benchmark to give Verstable the same minimal multiplicative hash function that unordered_dense applies internally by default to protect against weak user-supplied hash functions. As I mentioned earlier, this is necessary to activate its hash code “fragment” or “fingerprint” mechanism, without which it incurs a significant performance hit. I understand if you'd prefer not to make this change – it could be unfair to allow only one library to use a custom hash function. I’m considering adding this functionality to the library as an opt-out mechanism, as in Boost and unordered_dense, but I have mixed feelings about this approach (unlike in C++ maps that follow the std::unordered_map API, users wanting to use a custom hash function with Verstable will necessarily have to read the documentation, which explicitly states that the hash function should provide entropy in the high bits). Also, note that effectively excluding the cost of the hash function from the benchmarks (by using an identity hash function) could have some perhaps unexpected influence on the results. Some tables – including unordered_dense and Verstable – sometimes re-hash existing keys, so they will get those rehashes for free. But this issue mainly affects simple linear probing hash tables that use backshift deletion without storing hash codes or home bucket indices. At higher load factors, such tables must rehash many keys on deletion, so their deletion performance deteriorates rather spectacularly when the hash function is expensive. STC, in particular, suffers from this problem (@tylov). |
Beta Was this translation helpful? Give feedback.
-
Thanks for this effort! I am aware of this problem with STC hmap when using high load factors, but it does store 7 bits of the hash. I have an old branch where I'm using Robin Hood in the STC hmap that I will resurrect and test with. I didn't use it as it didn't perform better, but it will probably help on this issue, and also work better with higher load factors in general. |
Beta Was this translation helpful? Give feedback.
-
@attractivechaos Hello again everyone :) I’ve upload a draft of my article on my benchmarks here, if anyone would like to take a look. Suggestions are welcome and appreciated. In particular, read my description of your hash table in the Hash Tables Benchmarked section and my analysis of its results in the Analysis section and check that they are accurate and fair. Joaquín, I included the image you shared earlier showing the cluttering of elements in boost::unordered_flat_map vs absl::flat_hash_map. Is that okay? Or should I recreate the images using on my own data? Tyge, you were talking earlier about potentially replacing STC’s hash-table implementation with a Robin Hood hash table. Is that a definitive plan? Should I make a note about that in the article? Finally, one perhaps interesting change to the benchmarks themselves is the addition of a heat map that consolidates the graphs' data into one figure and was inspired by what Joaquín did with the results of Martin’s benchmarks. Here’s what it looks for the 20,000,000-key benchmarks: The red maxes out at 5.0 (five times slower than the best performer). I may add some little icons indicating which tables use tombstones or overflow bytes for erasure. As I mentioned earlier, boost does even better in the low-key-count benchmarks (which I think show performance when the tables are hotter in the cache). I’ll eventually be sharing the article on Reddit and maybe some other social media. |
Beta Was this translation helpful? Give feedback.
-
Note you will find difficulties writing a wrapper "find" function for it without hacking the library (I don't think it is possible to transform back the value pointer into an iterator with this data-structure, as some information is not given back to the user) - whereas it is always possible to convert the iterator into a pointer to value. If needed, I can provide a find method.
I agree with your points. Maybe the solution is to add a new entry if it is a different design.
Sure. GCC 13.2.0 on Linux/x86-64
This, and when you run the benchmark once again and compare the new result with the previous one. EDIT: I click on reply and github post a new comment instead of a reply :) |
Beta Was this translation helpful? Give feedback.
-
For information, since I got some free time (at last !), I have added "bulk" update operations to M*LIB DICT OA that enable to set/get multiple key/values in one operation (as a WIP feature since I don't have enough perspective to know if it's good and it isn't formally tested). It tries to prefetch data to minimize the cache miss. I have updated my own bench code and didn't see any gain at 1'000'000, but start seeing it at largest size (moreover it really depends on the CPU, the memory speed, the cache size). However I see a +30% gain at 10'000'000. @JacksonAllan : Unfortunately, I don't see how I can integrate it in your benchmark, as the API is incompatible (note that I don't know if it makes sense). @attractivechaos : I have updated the udb3 bench with this bulk interface. I get a nice gain with it. If you are interested I can provide a PR. |
Beta Was this translation helpful? Give feedback.
-
Hello again everyone :) I’ve updated the draft of the article. No need to read it all again, but I will try to summarize the most significant changes, especially those based on the earlier feedback:
@attractivechaos I’ve thanked you all (and included GitHub links) at the end of the article for taking the time to offer feedback on the draft. Please just let me know if you’d prefer not to be mentioned, or if you’d prefer to be mentioned in a different way (e.g. name vs GitHub username). Sorry I didn’t respond earlier to your response to my question about the current status of M*LIB’s DICT. If you want to release it now/soon as 0.7.3, that’s fine – I’ll update the description. Otherwise, I’m happy to refer to it as something like “DICT from M*LIB v0.7.3 (upcoming)”, perhaps with a note that it is already available on the master branch. |
Beta Was this translation helpful? Give feedback.
-
@JacksonAllan looks like you've finished your article. Planning on posting it on Reddit and other social media? |
Beta Was this translation helpful? Give feedback.
-
I was looking for why M*LIB was so slow on non-existing keys, and the result was simple. It was simply due to the load factor which is was too high for its design. Theoretically a load factor 0.875 should be around 2.5 times slower than 0.7 according to the design with 8 iterations in the search loop on average, vs 2.4 for a load of 0.7. And I measure 8 iterations of the search loop on average with this benchmark. Sometimes I am just happy when the number matches :) |
Beta Was this translation helpful? Give feedback.
-
Hello fellow hash-table developers :) Hope you don’t mind the unsolicited mention.
I’ve created this discussion for two reasons:
To direct your attention to this repository and the associated Reddit post, via which I’ve just published a new performance-oriented hash table in C. As I mention in the post, this table is the product of about a year of part-time research and experimentation with different hash table designs, primarily with variations of Robin Hood, SIMD, in-table chaining, and combinations thereof. Ultimately, I reached the same general conclusion as Malte Skarupke, namely that in-table chaining is a very good general-purpose solution. However, my design is not a carbon copy of his Bytell design. Most prominently, I trade an extra byte of overhead per bucket (i.e. two bytes instead of one) for a performance boost across a range of benchmarks.
To open a space for communication among hash table experts for the sake of sharing experience and ideas in future. I, for one, have a relatively comprehensive benchmark of C (and some C++) hash tables that I hope to open-source and publish within the next few months and that I think will interest the users I've mentioned.
Thanks for reading!
Mentioned users:
@martinus: Developer of two popular Robin Hood-based C++ tables and author of two comprehensive sets of C++ hash table benchmarks.
@Tessil: Developer of a popular Robin Hood-based C++ table, among other tables, and author of a set hash table benchmarks.
@skarupke: Developer of Bytell and author of many posts and a presentation about developing fast hash tables in C++.
@attractivechaos: Developer of the extremely popular Klib/Khash library and author of many hash table-related blog posts.
@camel-cdr: Has experimented with SIMD-based tables in C.
@stclib: Developer of the popular STC C library, which includes an open-addressing hash table.
@P-p-H-d: Developer of the popular M*LIB C library, which includes an open-addressing hash table.
@fowles: One of the developers of Google’s Abseil/Swiss table and the author of two presentations about it.
Beta Was this translation helpful? Give feedback.
All reactions