-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New hash function alternatives #310
Conversation
@bdice would you be so kind to leave a review if you have bandwidth? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work here @sleeepyjack! I see this PR was already merged but I wanted to look it over just so I am familiar with the implementation. I have a few comments here that you can apply in a follow-up PR if you would like. 👍
@@ -78,3 +78,6 @@ ConfigureBench(DYNAMIC_MAP_BENCH | |||
hash_table/dynamic_map/find_bench.cu | |||
hash_table/dynamic_map/contains_bench.cu | |||
hash_table/dynamic_map/erase_bench.cu) | |||
|
|||
ConfigureBench(HASH_BENCH |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Match existing style so this doesn't look like it belongs under the dynamic map benchmarks:
ConfigureBench(HASH_BENCH | |
################################################################################################### | |
# - hash benchmarks ------------------------------------------------------------------------------- | |
ConfigureBench(HASH_BENCH |
using namespace cuco::benchmark; | ||
using namespace cuco::utility; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to avoid using
declarations. If you write the full types, that will help readers who are unfamiliar with cuco internals (like me) know what namespaces contain the identifiers being used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, we consider benchmarks as user-facing code so we should not make these kinds of shortcuts. Since these namespaces are a bit wordy, how about two-character namespace aliases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind the verbosity at all, honestly. 😄 I'd vote cuco::utility
over ut
or cu
or util
any day.
char const* const bytes = (char const*)&key; ///< per-byte access | ||
uint32_t const* const blocks = (uint32_t const*)&key; ///< 4-byte word access |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid C-style casts.
} | ||
} | ||
|
||
// the following loop is only needed if the size of the key is no multiple of the block size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// the following loop is only needed if the size of the key is no multiple of the block size | |
// the following loop is only needed if the size of the key is not a multiple of the block size |
template <typename Key> | ||
struct XXHash_32 { | ||
private: | ||
static constexpr uint32_t prime1 = 0x9E3779B1U; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a weak preference for lowercase hex literals / literal suffixes like 0x9e3779b1u
because I find them more readable, but that's totally a matter of taste.
char const* const bytes = (char const*)&key; ///< per-byte access | ||
uint32_t const* const blocks4 = (uint32_t const*)&key; ///< 4-byte word access | ||
uint64_t const* const blocks8 = (uint64_t const*)&key; ///< 8-byte word access |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid C-style casts.
char const* const bytes = (char const*)&key; ///< per-byte access | ||
uint32_t const* const blocks = (uint32_t const*)&key; ///< 4-byte word access | ||
|
||
uint32_t offset = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably should be a std::size_t
to match nbytes
.
uint32_t const* const blocks4 = (uint32_t const*)&key; ///< 4-byte word access | ||
uint64_t const* const blocks8 = (uint64_t const*)&key; ///< 8-byte word access | ||
|
||
uint64_t offset = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably should be a std::size_t
to match nbytes
.
CHECK(h10(k10) == 2031761887105658523ULL); | ||
} | ||
|
||
// TODO SECTION("Check if device-generated hash values match the reference implementation.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still a TODO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha, good catch! Yes, I'll add them in the next PR.
@bdice thank you so much! Reading through your comments, they are all valid and I will address them in a follow-up PR. |
Also, I didn't leave a comment about this but wanted to share my work: in cudf I chose to use an intrinsic to rotate left: However, it looks like this compiles out to There is no 64-bit rotate left intrinsic (afaik) so I would not recommend a change here (so that 32-bit and 64-bit rotation functions look the same). |
This PR introduces a set of new hash function alternatives, namely
xxhash_32
andxxhash_64
: 32- and 64-bit versions of the famous xxHash hash function.fmix_32
andfmix_64
: 32- and 64-bit versions of the Murmur3 integer/avalanche finalizer.xxhash_32
is now the default hash function for all cuco data structures, due to its better performance in scenarios where the data structure fits into L1$ or L2$. For larger data structures, the performance advantage over Murmurhash3 is negligible.Benchmark scenario:
Run
cuco::static_set::contains()
multiple times in a row to ensure the set is resident in the cache. Target occupancy is set to 50%. We vary the table size to see the effects of different cache levels.Ref Time
shows the runtime formurmurhash3_32
andCmp Time
shows the time forxxhash_32
. All experiments have been conducted on a NVIDIA A100-SXM4-80GB.In addition, here are some isolated benchmarks calling each hash function in a tight loop on the device:
cuco::detail::MurmurHash3_32<int>
cuco::detail::MurmurHash3_32<long>
cuco::detail::MurmurHash3_32<key_128_bytes>
cuco::detail::XXHash_32<int>
cuco::detail::XXHash_32<long>
cuco::detail::XXHash_32<key_128_bytes>
cuco::detail::XXHash_64<int>
cuco::detail::XXHash_64<long>
cuco::detail::XXHash_64<key_128_bytes>
cuco::detail::MurmurHash3_fmix32<int>
cuco::detail::MurmurHash3_fmix64<long>
Closes #290