-
Notifications
You must be signed in to change notification settings - Fork 915
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix passing seed parameter to MurmurHash3_32 in cudf::hash() function (…
…#12875) Fixes passing seed parameter to `MurmurHash3_32` in `cudf::hash()` function. The `MurmurHash3_32` algorithm takes a seed value which helps provide variation in hash results. The seed parameter was not being passed to the algorithm through the `element_hasher` but only used in the `hash_combine` functions. This resulted in only small variations in the hash results. The following example illustrates: ``` { cudf::test::strings_column_wrapper const strings_col({"hello world"}); auto const input1 = cudf::table_view({strings_col}); for (int i = 0; i < 5; ++i) { auto output1 = cudf::hash(input1, cudf::hash_id::HASH_MURMUR3, i); std::cout << i << ": "; cudf::test::print(output1->view()); } } ``` The output for the 5 hashes with associated seed values: ``` seed hash 0 4241098952 1 4241099017 2 4241099082 3 4241099147 4 4241099213 ``` A `MurmurHash3_32` algorithm would produce the following variations if given the seed values: ``` seed hash 0 1586663183 1 1128525090 2 3382554948 3 1761283998 4 1862001904 ``` This PR passes the seed value through to the internal algorithm. Although the results do not exactly match the standard `MurmurHash3_32` the variations are much improved: ``` seed hash 0 4241098952 1 3782960922 2 1742023551 3 120752660 4 221470638 ``` This variation is important for some machine learning operations. Some gtests have hardcoded the hash values and were updated to match the new results. Authors: - David Wendt (https://github.com/davidwendt) - Bradley Dice (https://github.com/bdice) Approvers: - Bradley Dice (https://github.com/bdice) - Nghia Truong (https://github.com/ttnghia) - Yunsong Wang (https://github.com/PointKernel) URL: #12875
- Loading branch information
1 parent
f216c0b
commit 7bc4a7e
Showing
2 changed files
with
46 additions
and
43 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters