Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add XXHash_64 hash function to cudf #13612

Merged
merged 48 commits into from
Jul 19, 2023
Merged
Show file tree
Hide file tree
Changes from 46 commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
05868c9
Add XXHash_64 hash function to cudf
davidwendt Jun 23, 2023
a4dd39b
Merge branch 'branch-23.08' into fea-xxhash64
davidwendt Jun 23, 2023
f39bba0
fix typo in constant
davidwendt Jun 26, 2023
53603fc
Merge branch 'branch-23.08' into fea-xxhash64
davidwendt Jun 26, 2023
f9436d3
fix rotate function
davidwendt Jun 27, 2023
f318e72
Merge branch 'branch-23.08' into fea-xxhash64
davidwendt Jun 27, 2023
de00dc4
add string test; convert primes to hex; fix getblock logic
davidwendt Jun 27, 2023
660357e
fully-qualify calls to detail functions
davidwendt Jun 28, 2023
25202b6
Merge branch 'branch-23.08' into fea-xxhash64
davidwendt Jun 29, 2023
a5a0d4b
fix bug in xxhash finalize step
davidwendt Jun 29, 2023
c9c0624
Merge branch 'branch-23.08' into fea-xxhash64
davidwendt Jun 29, 2023
df5b8db
Merge branch 'branch-23.08' into fea-xxhash64
davidwendt Jun 29, 2023
e4abab3
Merge branch 'branch-23.08' into fea-xxhash64
davidwendt Jun 30, 2023
aaafd8e
add gtests for integer, double, fixed-point
davidwendt Jul 1, 2023
ac16d14
Merge branch 'branch-23.08' into fea-xxhash64
davidwendt Jul 1, 2023
e3b6839
Merge branch 'branch-23.08' into fea-xxhash64
davidwendt Jul 5, 2023
9dc57e4
Merge branch 'branch-23.08' into fea-xxhash64
davidwendt Jul 6, 2023
3a5dd7d
fix merge conflicts
davidwendt Jul 10, 2023
3b4dbb8
rename hash64 to xxhash64
davidwendt Jul 10, 2023
41c64b4
local conflict fix
davidwendt Jul 10, 2023
7d09f76
resolve final conflict
davidwendt Jul 10, 2023
c55175e
fix merge conflicts
davidwendt Jul 11, 2023
1ce895d
fix cmake format style violation
davidwendt Jul 11, 2023
65b6bad
forgot to include the style fix
davidwendt Jul 11, 2023
a541a3e
Merge branch 'branch-23.08' into fea-xxhash64
davidwendt Jul 12, 2023
9e2ab2c
undo unintentional unchanges
davidwendt Jul 12, 2023
6c5176b
Merge branch 'branch-23.08' into fea-xxhash64
davidwendt Jul 12, 2023
cdac656
Merge branch 'branch-23.08' into fea-xxhash64
davidwendt Jul 12, 2023
f4f98e0
Merge branch 'branch-23.08' into fea-xxhash64
davidwendt Jul 13, 2023
fd7bbd4
Merge branch 'branch-23.08' into fea-xxhash64
davidwendt Jul 13, 2023
2e89f98
Merge branch 'branch-23.08' into fea-xxhash64
davidwendt Jul 14, 2023
cd4acbb
fix merge conflicts
davidwendt Jul 16, 2023
5fc46b7
Merge branch 'branch-23.08' into fea-xxhash64
davidwendt Jul 17, 2023
a586b5b
change names of the gtests
davidwendt Jul 17, 2023
2585657
fix hash_functions.cuh references
davidwendt Jul 17, 2023
6ecf2a5
Merge branch 'branch-23.08' into fea-xxhash64
davidwendt Jul 17, 2023
e1e7b8d
rename test source file
davidwendt Jul 17, 2023
9a64914
Merge branch 'branch-23.08' into fea-xxhash64
davidwendt Jul 17, 2023
0624314
Merge branch 'fea-xxhash64' of github.com:davidwendt/cudf into fea-xx…
davidwendt Jul 17, 2023
cdec016
rename xxhash64 to xxhash_64
davidwendt Jul 17, 2023
b8f7b72
Merge branch 'fea-xxhash64' of github.com:davidwendt/cudf into fea-xx…
davidwendt Jul 18, 2023
3586abe
fix merge conflicts
davidwendt Jul 18, 2023
74cda04
fix cmake style violation
davidwendt Jul 18, 2023
e2b9197
Merge branch 'branch-23.08' into fea-xxhash64
davidwendt Jul 18, 2023
d47d624
add some const decls
davidwendt Jul 18, 2023
606b736
fix doxygen wording for the hash APIs
davidwendt Jul 18, 2023
a5e3838
use device-span
davidwendt Jul 19, 2023
f1f39e9
Merge branch 'branch-23.08' into fea-xxhash64
davidwendt Jul 19, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -347,6 +347,7 @@ add_library(
src/hash/murmurhash3_x86_32.cu
src/hash/murmurhash3_x64_128.cu
src/hash/spark_murmurhash3_x86_32.cu
src/hash/xxhash_64.cu
src/interop/dlpack.cpp
src/interop/from_arrow.cu
src/interop/to_arrow.cu
Expand Down
27 changes: 23 additions & 4 deletions cpp/include/cudf/hashing.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -70,10 +70,11 @@ std::unique_ptr<column> hash(
rmm::cuda_stream_view stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

//! Hash APIs
namespace hashing {

/**
* @brief Computes the MurmurHash3 32-bit of each row in the given table
* @brief Computes the MurmurHash3 32-bit hash value of each row in the given table
*
* This function computes the hash of each column using the `seed` for the first column
* and the resulting hash as a seed for the next column and so on.
Expand All @@ -93,7 +94,7 @@ std::unique_ptr<column> murmurhash3_x86_32(
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
* @brief Computes the hash values of each row in the input set of columns
* @brief Computes the MurmurHash3 64-bit hash value of each row in the given table
*
* This function takes a 64-bit seed value and returns hash values using the
* MurmurHash3_x64_128 algorithm. The hash produces in two uint64 values per row.
Expand All @@ -112,7 +113,7 @@ std::unique_ptr<table> murmurhash3_x64_128(
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
* @brief Computes the MurmurHash3 32-bit of each row in the given table
* @brief Computes the MurmurHash3 32-bit hash value of each row in the given table
*
* This function computes the hash similar to MurmurHash3_x86_32 with special processing
* to match Spark's implementation results.
Expand All @@ -131,7 +132,7 @@ std::unique_ptr<column> spark_murmurhash3_x86_32(
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
* @brief Computes the MD5 hash of each row in the given table
* @brief Computes the MD5 hash value of each row in the given table
*
* @param input The table of columns to hash
* @param stream CUDA stream used for device memory operations and kernel launches
Expand All @@ -144,6 +145,24 @@ std::unique_ptr<column> md5(
rmm::cuda_stream_view stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
* @brief Computes the XXHash_64 hash value of each row in the given table
*
* This function takes a 64-bit seed value and returns a column of type UINT64.
*
* @param input The table of columns to hash
* @param seed Optional seed value to use for the hash function
* @param stream CUDA stream used for device memory operations and kernel launches
* @param mr Device memory resource used to allocate the returned column's device memory
*
* @returns A column where each row is the hash of a row from the input
*/
std::unique_ptr<column> xxhash_64(
table_view const& input,
uint64_t seed = DEFAULT_HASH_SEED,
rmm::cuda_stream_view stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

} // namespace hashing

/** @} */ // end of group
Expand Down
5 changes: 5 additions & 0 deletions cpp/include/cudf/hashing/detail/hashing.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,11 @@ std::unique_ptr<column> md5(table_view const& input,
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr);

std::unique_ptr<column> xxhash_64(table_view const& input,
uint64_t seed,
rmm::cuda_stream_view,
rmm::mr::device_memory_resource* mr);

/* Copyright 2005-2014 Daniel James.
*
* Use, modification and distribution is subject to the Boost Software
Expand Down
Loading