-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Additional refactoring of hash functions #10462
Merged
Merged
Changes from all commits
Commits
Show all changes
18 commits
Select commit
Hold shift + click to select a range
63f187a
Refactor float normalization.
bdice 84de276
Refactor namespaces.
bdice bda5910
Remove this-> for consistency.
bdice 13b831a
Unify Spark/non-Spark implementations and separate tail processing in…
bdice 6929e63
Move MurmurHash3_32 and default_hash into cudf::detail.
bdice 6c293ed
Make SparkMurmurHash3_32 inherit from MurmurHash3_32 (tests currently…
bdice 7cdec5f
Revert "Make SparkMurmurHash3_32 inherit from MurmurHash3_32 (tests c…
bdice cafd0b3
Make default constructor constexpr.
bdice a24f52d
Define hash_value_type in cudf namespace.
bdice bd0d981
Merge branch 'branch-22.06' into hashing-refactor-2
bdice 8caa85c
Merge remote-tracking branch 'upstream/branch-22.06' into hashing-ref…
bdice 7927735
Replace rotl32 with rotate_bits_left.
bdice f02fa68
Update bpe_tokenizer.cuh.
bdice b01a59c
Merge remote-tracking branch 'upstream/branch-22.06' into hashing-ref…
bdice e491115
Fix subword includes.
bdice 1e63821
Revert copyright change.
bdice 0a17018
Add [[fallthrough]].
bdice 0fcbb23
Make operator() const.
bdice File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of scope for this PR (really, out of scope for any PR in this repository), but it seems bad that we're relying on a cuco detail namespace here. @jrhemstad do we need
MurmurHash3_32
to be exposed more publicly incuco
? If we expect callers to use it as a provided hash function then it shouldn't be detail.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this is awkward / undesirable. This may be resolved or improved by #10401.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, no. The issue is that we need to provide a stream and that requires re-supplying the other defaulted arguments. https://github.com/NVIDIA/cuCollections/blob/fb58a38701f1c24ecfe07d8f1f208bbe80930da5/include/cuco/static_map.cuh#L224-L231
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but IMO it's a design issue in cuco if supplying a stream while using the default hash requires pulling the default hash out of a detail namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with that. (Sorry, the "Actually, no" was about whether #10401 would improve this situation. It would not.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I think in the long run we probably don't want to be able to provide the hash function and equality comparator as template parameters of these methods, but rather as parameters of the constructor. It doesn't really make sense to be able to insert and query with different ones. Unfortunately we currently abuse this ability in libcudf, so I don't think removing it is feasible in the short term, but in the longer term getting rid of this would make it easier to provide streams without having this problem since the hash/equality operators would be defined on construction and the user wouldn't need to provide those templates unless they wanted to override the default.