-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose seed argument to hash_values #12795
Expose seed argument to hash_values #12795
Conversation
Seed value to use for the hash function. | ||
Note - This only has effect for the following supported | ||
hash functions: | ||
* murmur3: MurmurHash3 hash function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the seed has no effect, I think maybe we should warn, at least since we have some flexibility here given this has no equivalent pandas API.
Co-authored-by: brandon-b-miller <[email protected]>
# Check single column | ||
out_one = gdf[["a"]].hash_values(method=method) | ||
if warning_expected: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another alternative is to separate out the test for warning into a separate pytest and not use seed
at all for this one.
One observation while testing out ser = cudf.Series(["hello world"]).str.character_ngrams(5,False)
ser.hash_values(seed=0)
ser.hash_values(seed=0xFFFFFFF)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving. Not sure about the variance question but given that we're just wrapping here I'd say it's a libcudf question.
/merge |
Description
This PR exposes the
seed
param tohash_values
that is already supported by libcudf'shash
method.Closes #12775
Checklist