Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Hash function refactoring #13706

Open
bdice opened this issue Jul 17, 2023 · 3 comments
Open

[FEA] Hash function refactoring #13706

bdice opened this issue Jul 17, 2023 · 3 comments
Labels
0 - Backlog In queue waiting for assignment feature request New feature or request libcudf Affects libcudf (C++/CUDA) code.

Comments

@bdice
Copy link
Contributor

bdice commented Jul 17, 2023

Following up from #13681 and #13612, there are some tasks I think can be done to clean up hashing code. I am opening this issue to be a tracker for the work we've deferred from other PRs.

@bdice bdice added feature request New feature or request Needs Triage Need team to review and classify labels Jul 17, 2023
@GregoryKimball GregoryKimball added 0 - Backlog In queue waiting for assignment libcudf Affects libcudf (C++/CUDA) code. and removed Needs Triage Need team to review and classify labels Jul 22, 2023
@bdice
Copy link
Contributor Author

bdice commented Jan 17, 2024

@bdice
Copy link
Contributor Author

bdice commented Jan 19, 2024

@davidwendt proposed removing unsanitized nulls from hashing tests. I agree with this idea. I refactored MD5's tests in 0188115 and will do the same for SHA in PR #14391, but additional work is needed for other hashing algorithms to remove unsanitized nulls from the tests.

#14391 (comment)

@bdice
Copy link
Contributor Author

bdice commented Jan 22, 2024

We also need to standardize null behavior for hashing algorithms. See #10451.

rapids-bot bot pushed a commit that referenced this issue Feb 20, 2024
The `cudf::hashing::spark_murmurhash3_x86_32()` function was moved to the Spark plugin since it had common code with the Spark implementation of `xxhash_64` (also implemented in the plugin).
This change deprecates the API and the generic `cudf::hashing::hash()` function to be removed in a follow-on release.

Reference hash cleanup issue: #13706

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Karthikeyan (https://github.com/karthikeyann)

URL: #15074
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0 - Backlog In queue waiting for assignment feature request New feature or request libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
Development

No branches or pull requests

2 participants