-
Notifications
You must be signed in to change notification settings - Fork 914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove deprecated hash() and spark_murmurhash3_x86_32() #15375
Conversation
@@ -313,8 +313,11 @@ JNIEXPORT jlong JNICALL Java_ai_rapids_cudf_ColumnVector_hash(JNIEnv *env, jobje | |||
try { | |||
auto column_views = | |||
cudf::jni::native_jpointerArray<cudf::column_view>{env, column_handles}.get_dereferenced(); | |||
return release_as_jlong(cudf::hash(cudf::table_view{column_views}, | |||
static_cast<cudf::hash_id>(hash_function_id), seed)); | |||
if (hash_function_id == 2) { // MD5 from HashType.java |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking for some advice on handling this since the cudf::hash_id
has been repurposed for partitioning-hash only now and so does not include MD5.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove support for MD5 from this method (and from HashType.java). The ColumnVector.md5Hash method can call a new, dedicated private native method for MD5 hashing rather than relying on passing an enum to the private native hash
method.
MURMUR3(1), | ||
HASH_SPARK_MURMUR3(2), | ||
HASH_MD5(3); | ||
MURMUR3(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick question. I see the MD5 type being removed here but still being referenced elsewhere (ColumnVector.java). Is that intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
References to HashType.HASH_MD5
should all be removed from ColumnVector.java
in this PR.
#15375 (comment)
/merge |
Description
Remove deprecated libcudf hash functions. The
cudf::hash()
andcudf::hashing::spark_murmurhash3_x86_32()
were deprecated in previous releases. Thecudf::hash_partition()
function still relies on the enumhash_id
so it has been moved fromhashing.cpp
topartitioning.hpp
.Calls to
cudf::hashing::spark_murmurhash3_x86_32()
were also removed from the JNI code.Checklist