Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reevaluate hash function used #504

Closed
apriori opened this issue Aug 27, 2019 · 9 comments
Closed

Reevaluate hash function used #504

apriori opened this issue Aug 27, 2019 · 9 comments

Comments

@apriori
Copy link

apriori commented Aug 27, 2019

As seen here SHA512 is used for file hashing. I wonder why, cryptographic quality of the hash function should be secondary, but instead the focus should be on performance.

I think the benchmark should be to offer similar performance to CCache which apparently uses xxhash.

I assume the choice for SHA512 was made due to implementation simplicity (its already there in the rust eco system) and robustness. Hashing on my laptop (i7-7700HQ CPU @ 2.80GHz) accounts for up to 60% of the same runtime as full compilation, which explains the somewhat bad speed up factor of ~2.3.

Apparently the rust eco system has an implementation for xxhash. Maybe that one could be used as a first step (should be easy to achieve), if one reluctantly wants to start researching the topic of "best hashing functions".

@luser
Copy link
Contributor

luser commented Aug 27, 2019

We previously switched from SHA-1 to SHA-512 in #108. There was discussion in this PR about switching to RustCrypto regarding potentially switching to BLAKE2. I don't think we're married to a specific hash function, anything that's performant and has good collision resistance is probably fine.

@apriori
Copy link
Author

apriori commented Aug 27, 2019

I of course did not want to imply anything like that. I will see whether I can quickly hack up a fork using the mentioned and implemention and see how that performs for my project.

@apriori
Copy link
Author

apriori commented Aug 27, 2019

Another candidate: https://github.com/eldruin/wyhash-rs

@michaelwoerister
Copy link
Contributor

We've used BLAKE2 in the Rust compiler for a while and it was pretty performant. We later switched to a 128 bit SipHash because it's even faster, 128 bits are enough for our purposes, and we don't need a full blown cryptographic hash function. The fastest function in our tests was MetroHash but it's probably less tested than SipHash and supports 128 bit hashes at most.

@apriori
Copy link
Author

apriori commented Oct 21, 2019

While a performant hash function is not irrelevant, testing shows runtime gets completely dominated by preprocessing overhead, to an extend that you would not even notice a super slow hash function.

See #497 (comment) for further info.

@luser
Copy link
Contributor

luser commented Oct 21, 2019

I would be surprised to find that using a faster hash function made a noticeable difference in any performance testing but if someone wanted to write a patch and gather some data that would be a good start. Changing the hash function should be trivial in practice, everything goes through the methods linked in the original comment on this issue.

@apriori
Copy link
Author

apriori commented Oct 21, 2019

Yeah, we were just lead astray by the wrong logging of "generate_hash_key", which actually is the duration of the preprocessing expansion step. I'd actually recommend closing this issue as mostly irrelevant.

@newpavlov
Copy link

Since the hash function got changed to BLAKE3, I think this issue can be closed?

@Xuanwo
Copy link
Collaborator

Xuanwo commented Jan 2, 2023

Since the hash function got changed to BLAKE3, I think this issue can be closed?

cc @sylvestre to help close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants