-
Notifications
You must be signed in to change notification settings - Fork 559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reevaluate hash function used #504
Comments
We previously switched from SHA-1 to SHA-512 in #108. There was discussion in this PR about switching to RustCrypto regarding potentially switching to BLAKE2. I don't think we're married to a specific hash function, anything that's performant and has good collision resistance is probably fine. |
I of course did not want to imply anything like that. I will see whether I can quickly hack up a fork using the mentioned and implemention and see how that performs for my project. |
Another candidate: https://github.com/eldruin/wyhash-rs |
We've used BLAKE2 in the Rust compiler for a while and it was pretty performant. We later switched to a 128 bit SipHash because it's even faster, 128 bits are enough for our purposes, and we don't need a full blown cryptographic hash function. The fastest function in our tests was MetroHash but it's probably less tested than SipHash and supports 128 bit hashes at most. |
While a performant hash function is not irrelevant, testing shows runtime gets completely dominated by preprocessing overhead, to an extend that you would not even notice a super slow hash function. See #497 (comment) for further info. |
I would be surprised to find that using a faster hash function made a noticeable difference in any performance testing but if someone wanted to write a patch and gather some data that would be a good start. Changing the hash function should be trivial in practice, everything goes through the methods linked in the original comment on this issue. |
Yeah, we were just lead astray by the wrong logging of "generate_hash_key", which actually is the duration of the preprocessing expansion step. I'd actually recommend closing this issue as mostly irrelevant. |
Since the hash function got changed to BLAKE3, I think this issue can be closed? |
cc @sylvestre to help close |
As seen here SHA512 is used for file hashing. I wonder why, cryptographic quality of the hash function should be secondary, but instead the focus should be on performance.
I think the benchmark should be to offer similar performance to CCache which apparently uses xxhash.
I assume the choice for SHA512 was made due to implementation simplicity (its already there in the rust eco system) and robustness. Hashing on my laptop (i7-7700HQ CPU @ 2.80GHz) accounts for up to 60% of the same runtime as full compilation, which explains the somewhat bad speed up factor of ~2.3.
Apparently the rust eco system has an implementation for xxhash. Maybe that one could be used as a first step (should be easy to achieve), if one reluctantly wants to start researching the topic of "best hashing functions".
The text was updated successfully, but these errors were encountered: