-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvements in hash_functions.cuh
#10081
Comments
Followup to #9919 -- kernel merging and code cleanup for Murmur3 hash. Partial fix for #10081. Benchmarked `compute_bytes` kernel with aligned read vs unaligned read and saw no difference. Looking into it further to confirm that the `uint32_t` construction was doing the same thing implicitly. Due to byte alignment, the string alignment will require the `getblock32` function regardless. Regardless, the benchmarks ran with 100, 103, and 104 byte strings had negligible performance differences. This reflects forced misalignment not negatively impacting the hash speed. Authors: - Ryan Lee (https://github.com/rwlee) - Bradley Dice (https://github.com/bdice) Approvers: - Bradley Dice (https://github.com/bdice) - Christopher Harris (https://github.com/cwharris) URL: #10143
This issue has been labeled |
This PR refactors a few pieces of libcudf's hash functions: - Define the utility function `hash_combine` only once (with 32/64 bit overloads), rather than several times in the codebase - ~Remove class template parameter from `MurmurHash3_32` and related classes. This template parameter was redundant. We already use a template for the argument of the `compute` method, which is called by `operator()`, so I put the template parameter on `operator()` instead of the whole class. I think this removal of the template parameter could be considered API-breaking so I added the `breaking` label.~ I retracted this change after conversation with @jrhemstad. I'll look into a different way to do this soon, using a dispatch-to-invoke approach as in #8217. This addresses part of issue #10081. I have a few more things I'd like to try, but this felt like a nicely-scoped PR so I stopped here for the moment. I benchmarked the code before and after making these changes and saw a small but consistent decrease in runtime. The benchmarks in `HashBenchmark/{HASH_MURMUR3,HASH_SERIAL_MURMUR3,HASH_SPARK_MURMUR3}_{nulls,no_nulls}/*` all decreased or saw no change in runtime, with a geometric mean of 2.87% less time. The benchmarks in `Hashing/hash_partition/*` all decreased or saw no change in runtime, with a geometric mean of 2.37% less time. For both sets of benchmarks, the largest data sizes saw more significant decreases in runtime, with a best-improvement of 7.38% less time in `HashBenchmark/HASH_MURMUR3_nulls/16777216` (similar for other large data sizes) and a best-improvement of 10.54% less time in `Hashing/hash_partition/1048576/256/64` (similar for other large data sizes). Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Conor Hoekstra (https://github.com/codereport) URL: #10379
This issue has been labeled |
Additional work related to #10081. This is breaking because it reorganizes several public names/namespaces. Summary of changes in this PR: - The `cudf` namespace now wraps the contents of `hash_functions.cuh`, and some public names are now classified as `detail` APIs. - `SparkMurmurHash3_32` has been updated to align with the design and naming conventions of `MurmurHash3_32` Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Jake Hemstad (https://github.com/jrhemstad) - Vyas Ramasubramani (https://github.com/vyasr) URL: #10462
This issue has been labeled |
This is still relevant. I aim to work on a few more of these ideas after #11296, when I can also move edit: that move is done in #11489. |
We should also add documentation to detail methods in |
This PR moves the `SparkMurmurHash3_32` functor from `hash_functions.cuh` to `spark_murmur_hash.cu`, the only place where it is used. **This is a pure move**, with one small exception to avoid compiler warnings about unused members of the hash functor template instantiations for nested types. I refactored the class template to disallow nested types for the hash functor and removed those specializations using `CUDF_UNREACHABLE`, rather than allowing type dispatching to create template instantiations that have no defined use. (Nested types are being handled by the custom device row hasher in `spark_murmur_hash.cu`, and require some state information that cannot be easily carried in the functor itself.) I am planning to do further refactoring later, but wanted to separate this "pure move" as much as possible. Part of #10081. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Nghia Truong (https://github.com/ttnghia) - Ryan Lee (https://github.com/rwlee) URL: #11489
@bdice We've made some improvements to |
@bdice do you still want to do anything here? |
We haven't adopted |
The file
hash_functions.cuh
has quite a bit of room for cleanup and improvement.std::byte
more broadly.compute_bytes
approach instead of re-implementing hash function instring_view
template instantiation. (Make sure [un]aligned reads are handled correctly.)See PRs for reference: comments in #9919, ongoing SHA work in #9215, refactors in #10379.
Additional context
Originally posted by @harrism in #9919 (comment)
The text was updated successfully, but these errors were encountered: