-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix passing seed parameter to MurmurHash3_32 in cudf::hash() function #12875
Fix passing seed parameter to MurmurHash3_32 in cudf::hash() function #12875
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. Thank you for the fix, and apologies that this slipped past me when writing/reviewing this previously.
Can confirm that with this pr, I'm seeing a much larger variation and the results are a lot more in line with what I needed for my use case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-approving (previously draft).
/merge |
Description
Fixes passing seed parameter to
MurmurHash3_32
incudf::hash()
function. TheMurmurHash3_32
algorithm takes a seed value which helps provide variation in hash results. The seed parameter was not being passed to the algorithm through theelement_hasher
but only used in thehash_combine
functions. This resulted in only small variations in the hash results. The following example illustrates:The output for the 5 hashes with associated seed values:
A
MurmurHash3_32
algorithm would produce the following variations if given the seed values:This PR passes the seed value through to the internal algorithm. Although the results do not exactly match the standard
MurmurHash3_32
the variations are much improved:This variation is important for some machine learning operations.
Some gtests have hardcoded the hash values and were updated to match the new results.
Checklist