Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove deprecated hash() and spark_murmurhash3_x86_32() #15375

Merged
merged 17 commits into from
Apr 4, 2024

Conversation

davidwendt
Copy link
Contributor

@davidwendt davidwendt commented Mar 22, 2024

Description

Remove deprecated libcudf hash functions. The cudf::hash() and cudf::hashing::spark_murmurhash3_x86_32() were deprecated in previous releases. The cudf::hash_partition() function still relies on the enum hash_id so it has been moved from hashing.cpp to partitioning.hpp.
Calls to cudf::hashing::spark_murmurhash3_x86_32() were also removed from the JNI code.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@davidwendt davidwendt added 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Mar 22, 2024
@davidwendt davidwendt self-assigned this Mar 22, 2024
@github-actions github-actions bot added CMake CMake build issue Java Affects Java cuDF API. labels Mar 22, 2024
@davidwendt davidwendt added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Mar 25, 2024
@davidwendt davidwendt marked this pull request as ready for review March 27, 2024 13:27
@davidwendt davidwendt requested review from a team as code owners March 27, 2024 13:27
@davidwendt davidwendt requested review from vyasr and nvdbaranec March 27, 2024 13:27
@@ -313,8 +313,11 @@ JNIEXPORT jlong JNICALL Java_ai_rapids_cudf_ColumnVector_hash(JNIEnv *env, jobje
try {
auto column_views =
cudf::jni::native_jpointerArray<cudf::column_view>{env, column_handles}.get_dereferenced();
return release_as_jlong(cudf::hash(cudf::table_view{column_views},
static_cast<cudf::hash_id>(hash_function_id), seed));
if (hash_function_id == 2) { // MD5 from HashType.java
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking for some advice on handling this since the cudf::hash_id has been repurposed for partitioning-hash only now and so does not include MD5.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove support for MD5 from this method (and from HashType.java). The ColumnVector.md5Hash method can call a new, dedicated private native method for MD5 hashing rather than relying on passing an enum to the private native hash method.

cpp/include/cudf/hashing.hpp Show resolved Hide resolved
@davidwendt davidwendt added breaking Breaking change and removed non-breaking Non-breaking change labels Mar 27, 2024
MURMUR3(1),
HASH_SPARK_MURMUR3(2),
HASH_MD5(3);
MURMUR3(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick question. I see the MD5 type being removed here but still being referenced elsewhere (ColumnVector.java). Is that intended?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

References to HashType.HASH_MD5 should all be removed from ColumnVector.java in this PR.
#15375 (comment)

@davidwendt davidwendt requested a review from jlowe April 4, 2024 16:06
@bdice
Copy link
Contributor

bdice commented Apr 4, 2024

/merge

@rapids-bot rapids-bot bot merged commit 8509054 into rapidsai:branch-24.06 Apr 4, 2024
73 checks passed
@davidwendt davidwendt deleted the remove-dep-hash branch April 4, 2024 20:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team breaking Breaking change CMake CMake build issue improvement Improvement / enhancement to an existing function Java Affects Java cuDF API. libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants