-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SHA-1 and SHA-2 hash functions. #14391
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to have some Java smoke tests for the new hash mappings, see HashTest.java for how the other hash functions are being tested.
@jlowe It looks like these tests might be in Also note that this PR does not add list hashing support. Is that a requirement from the Spark side? |
Oh, sorry, my bad. HashTest.java is in NVIDIA/spark-rapids-jni which we use as a wrapper around cudf, didn't notice that file was in another project in the IDE.
I don't think this is necessary, since IMO the Java test purpose is to make sure the bindings are accurate more than to test every single corner case of the underlying algorithm. Doing the latter adds a lot of redundancy with the C++ tests (unless that extensive testing isn't there or anywhere else). The Java bindings aren't doing anything specific to the type. Regarding list support, it would be nice if it supported hashing a list of uint8, since that's how we represent Spark's BinaryType. BinaryType is the only type that Spark supports hashing with these algorithms. However we probably can cast it to a string column as a workaround, so it shouldn't be a blocker. |
Looks like the raw |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recommend reverting the enum changes.
… xxhash64 to cudf Python. (#14538) This PR refactors the Python code for `IndexedFrame.hash_values` to use the newer named C++ functions from `cudf::hashing::*`. I also added bindings for xxhash64 and updated some tests. Needed for #14391. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Lawrence Mitchell (https://github.com/wence-) URL: #14538
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python/cython changes look good
…ents about how to reproduce test values, remove unsanitized null tests.
/merge |
Description
This PR adds support for SHA-1 and SHA-2 (SHA-256, SHA-512, and truncated digests SHA-224, SHA-384) hash functions. Resolves #8641. Replaces #9215.
Checklist