-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Add support for SHA256 and SHA512 to cudf::hash #8641
Comments
Hi @MikeChenfu , thanks for the request. Does the existing |
Hi @beckernick, thanks for the reply. Currently, murmur3 is not the choice for our use case. It would be great if we have multiple choices for the hash function. |
Likely will need support for this at the C++ (libcudf) layer. cc: @jrhemstad @harrism |
We do have some other hash functions other than murmur3 that aren't exposed to Python: cudf/cpp/include/cudf/types.hpp Lines 333 to 338 in 3ee264c
|
@shwina @jrhemstad Thanks for the information. Can we have the |
Reference #6020 |
This PR introduces a public API in cuDF for MD5 hashing, using the parameter `DataFrame.hash_columns(..., method="md5")` or `Series.hash_values(..., method="md5")`. The default hashing method is MurmurHash3 (`method="murmur3"`). I also changed the return value of `Series.hash_values` to be a `Series`, rather than a cupy array. Related to #8641. SHA support will be added in a later PR. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Michael Wang (https://github.com/isVoid) - Ashwin Srinath (https://github.com/shwina) URL: #9390
This issue has been labeled |
Status update on this issue: I have been working on several refactors, fixes, and performance enhancements for libcudf's hashing functionality. I expect this feature to land in the 22.06 release with PR #9215. |
This PR adds support for SHA-1 and SHA-2 (SHA-256, SHA-512, and truncated digests SHA-224, SHA-384) hash functions. Resolves #8641. Replaces #9215. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Robert Maynard (https://github.com/robertmaynard) - Matthew Roeschke (https://github.com/mroeschke) - David Wendt (https://github.com/davidwendt) - https://github.com/nvdbaranec URL: #14391
Is your feature request related to a problem? Please describe.
Hello, currently I am thinking if the cudf supports more hash functions like hashlib.sha256, hashlib.sha512. Thanks for the consideration.
Describe the solution you'd like
df['sha256'] = df['h'].hash_func(method='sha256')
Describe alternatives you've considered
Additional context
Add any other context, code examples, or references to existing implementations about the feature request here.
The text was updated successfully, but these errors were encountered: