-
Notifications
You must be signed in to change notification settings - Fork 912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add XXHash_32 hasher #17533
base: branch-25.02
Are you sure you want to change the base?
Add XXHash_32 hasher #17533
Conversation
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
/ok to test |
/ok to test |
Nice work so far @PointKernel. I was experimenting with the same thing locally and had most of the same changes. 😄 |
Thanks, @bdice. The failing tests are a bit concerning, but I'll investigate them further. |
Co-authored-by: Bradley Dice <[email protected]>
/ok to test |
Perhaps plumb this through as a public function as well here? |
/ok to test |
/ok to test |
@PointKernel Do we want to make a change like this to diff --git a/cpp/include/cudf/join.hpp b/cpp/include/cudf/join.hpp
index afefd04d4f..a4abd87811 100644
--- a/cpp/include/cudf/join.hpp
+++ b/cpp/include/cudf/join.hpp
@@ -18,6 +18,7 @@
#include <cudf/ast/expressions.hpp>
#include <cudf/hashing.hpp>
+#include <cudf/hashing/detail/default_hash.cuh>
#include <cudf/table/table_view.hpp>
#include <cudf/types.hpp>
#include <cudf/utilities/default_stream.hpp>
@@ -41,15 +42,6 @@ namespace CUDF_EXPORT cudf {
*/
enum class has_nested : bool { YES, NO };
-// forward declaration
-namespace hashing::detail {
-
-/**
- * @brief Forward declaration for our Murmur Hash 3 implementation
- */
-template <typename T>
-class MurmurHash3_x86_32;
-} // namespace hashing::detail
namespace detail {
/**
@@ -307,7 +299,7 @@ enum class nullable_join : bool { YES, NO };
class hash_join {
public:
using impl_type = typename cudf::detail::hash_join<
- cudf::hashing::detail::MurmurHash3_x86_32<cudf::hash_value_type>>; ///< Implementation type
+ cudf::hashing::detail::default_hash<cudf::hash_value_type>>; ///< Implementation type
hash_join() = delete;
~hash_join(); |
The current hash join logic has a hasher-related bug, making performance evaluation by simply switching the hasher unreliable. Once rapidsai/rapids-cmake#723 is merged, I plan to address the issue first and then test the new hasher and hash table together to evaluate how all three impact overall performance. |
The differences between 28 and 29 registers are not really material, the compiler may optimize things differently when in a larger kernel. I saw that the mixed join kernels aren’t even always monotonic between assumed complexity and register usage. The hash kernel alone is not big enough to really know, so performance and avoidance of local memory spilling in larger kernels are the key metrics I would use. |
I see what you mean. I'll go ahead and make this PR simply add the xxhash_32 hasher to libcudf without changing the default hash. Now that rapids-cmake #723 has been merged, I'll first fix the hasher bug in the hash join and then assess the impact of switching the default hash. Does that sound good to you? |
@PointKernel Yes! I support that plan. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need a follow-up PR for the Python side.
edit: we decided to do it here!
Description
Contributes to #17531
This PR introduces the xxhash_32 hasher to libcudf as a preparatory step for evaluating the impact of replacing murmurhash3_x86_32 with xxhash_32 as the default hash.
Checklist