Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SHA-1 and SHA-2 hash functions. #9215

Closed
wants to merge 119 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
119 commits
Select commit Hold shift + click to select a range
436e652
Initial SHA structure.
rwlee Aug 18, 2020
7c1b661
First pass of SHA1 implementation.
rwlee Aug 19, 2020
aac184b
Apply clang-format.
bdice Sep 3, 2021
ade59b7
Update comments and docstrings, enable SHA-1.
bdice Sep 7, 2021
ec4673e
Add test values for SHA-1.
bdice Sep 7, 2021
2596297
Remove unused seed value from MD5Hash.
bdice Sep 7, 2021
f775bbc
Use std::memcpy, add comments.
bdice Sep 7, 2021
955dadf
Use __byte_perm to swap endianness.
bdice Sep 8, 2021
7dd9a9e
Update comments.
bdice Sep 8, 2021
77136bc
Update for changes to cudf::make_strings_column API.
bdice Sep 8, 2021
4f7e0d4
Make rotl32 a CUDA_DEVICE_CALLABLE, use __funnelshift_l intrinsic.
bdice Sep 8, 2021
e42b5a2
Add comments.
bdice Sep 8, 2021
070ee5f
Move MD5 and SHA functions to the only .cu file that uses them.
bdice Sep 8, 2021
252147e
Use named constant for MD5 chunk size.
bdice Sep 9, 2021
d998656
Fix issue by using named temp variables.
bdice Sep 9, 2021
f535d27
Use intrinsic for swapping endianness.
bdice Sep 9, 2021
9d096f3
Fix bug in message length. Must be stored as a big-endian 64-bit value.
bdice Sep 9, 2021
a7a8e6e
Improve comments.
bdice Sep 9, 2021
dfb1c99
Expand anonymous namespace.
bdice Sep 9, 2021
9735b2f
Clean up SHA implementation and expand tests.
bdice Sep 9, 2021
4a7d8d2
Enable all tests.
bdice Sep 9, 2021
585bea4
Revert changes to MD5 (separated into #9212).
bdice Sep 10, 2021
9ce6f17
Move endian swaps into functions.
bdice Sep 10, 2021
516b857
Draft of unified SHA functions - issues runtime error because it uses…
bdice Sep 13, 2021
f4e8c70
Use CRTP design. Also add back missing code to write the output.
bdice Sep 14, 2021
67bf9af
Use generic sha_hash function for SHA-1 (can be reused for SHA-256, S…
bdice Sep 14, 2021
ead336e
Intermediate work on SHA-256.
bdice Sep 15, 2021
5867ebf
Fix bugs in SHA-256 implementation (uninitialized hash constants, inc…
bdice Sep 15, 2021
c94dc8c
Add SHA-256 tests.
bdice Sep 15, 2021
33878d1
Add SHA-512 draft (one failing test).
bdice Sep 15, 2021
7abc5ac
Fix bug in message finalization.
bdice Sep 16, 2021
a89f34c
Draft of SHA-224, SHA-384.
bdice Sep 16, 2021
d9ecdfa
Use explicit SHA word types, add implementations of SHA-224 and SHA-384.
bdice Sep 16, 2021
cc683fc
Enable SHA-224, SHA-384.
bdice Sep 16, 2021
ad65b49
Add tests for SHA-224, SHA-384.
bdice Sep 16, 2021
f338217
Use rmm::cuda_stream_view.
bdice Sep 16, 2021
c46965e
Small refactorings.
bdice Sep 20, 2021
540f983
Use process_key function.
bdice Sep 20, 2021
244f437
Use process function for string data.
bdice Sep 20, 2021
2023ac1
Add comments to process function.
bdice Sep 20, 2021
502e11a
Use constexpr.
bdice Sep 20, 2021
82dd92b
Split sha1_hash_step.
bdice Sep 20, 2021
f05a43a
Move hash step functions above the SHA classes.
bdice Sep 20, 2021
f291c93
Use east const, add comments.
bdice Sep 20, 2021
4b49f2f
Pass hash state by reference.
bdice Sep 20, 2021
0ffa903
Pass hash state by reference in operator and process functions.
bdice Sep 20, 2021
84c034d
Update comment.
bdice Sep 20, 2021
987992e
Store sha_intermediate_data in Hasher instance.
bdice Sep 21, 2021
8e38d61
Drop const.
bdice Sep 21, 2021
56ab37b
Use underlying for hash_step calls.
bdice Sep 21, 2021
b9a37cd
Use memcpy instead of std::memcpy.
bdice Sep 21, 2021
5ae53ca
Update Java enum.
bdice Sep 21, 2021
1591793
Update Cython enum.
bdice Sep 21, 2021
dbd469c
Rename fixed width process function.
bdice Sep 21, 2021
48a08c4
Combine unsupported type dispatches.
bdice Sep 21, 2021
0d280a5
Use const&.
bdice Sep 21, 2021
952251b
Perform memcpys once instead of as a loop.
bdice Sep 21, 2021
2474ec9
Move hash constants into sha_hash.cu.
bdice Sep 21, 2021
d7cfbc5
Fail on invalid hash functions.
bdice Sep 21, 2021
8c6983d
Use uint32_t consistently to avoid mixing signed/unsigned types.
bdice Sep 21, 2021
590f8b3
Simplify word size.
bdice Sep 21, 2021
440583d
Minor improvements to constness.
bdice Sep 21, 2021
d9ae7f1
Add tests for empty tables.
bdice Sep 21, 2021
4e8fdf3
Enable debug mode on sha_hash.cu so that CI can compile.
bdice Sep 22, 2021
c47945d
Rename intermediate data to hash state.
bdice Sep 27, 2021
b37e5f0
Use HashDispatcher for type dispatch.
bdice Sep 27, 2021
607aaad
Cleanup of templates and class/variable names.
bdice Sep 27, 2021
2779a96
Reorganize headers, update error message.
bdice Sep 28, 2021
93b6f54
Generate null mask for all hash values.
bdice Sep 28, 2021
f850e5f
Update tests for new null behavior.
bdice Sep 28, 2021
c5711c8
Merge remote-tracking branch 'upstream/branch-21.12' into sha
bdice Oct 4, 2021
1c6cc2b
Simplify SFINAE.
bdice Oct 4, 2021
2374df4
Add benchmarks for SHA functions.
bdice Oct 8, 2021
c0381b1
Limit range to reduce memory usage for SHA512.
bdice Oct 8, 2021
21b3e26
Merge remote-tracking branch 'upstream/branch-21.12' into sha
bdice Oct 14, 2021
d1936f3
Add SHA methods to Python.
bdice Oct 19, 2021
a373221
Add tests for SHA hashing.
bdice Oct 19, 2021
a7773b8
Move result_location to constructor.
bdice Oct 20, 2021
92466b6
Merge branch 'branch-21.12' into sha
bdice Oct 22, 2021
27ee762
Merge remote-tracking branch 'upstream/branch-21.12' into sha
bdice Oct 25, 2021
927b419
Fix SHA constant qualifiers.
bdice Oct 25, 2021
6decb54
Rename and move leaf type check.
bdice Oct 25, 2021
95b0945
Move shared utility functions to utilities/hash_functions.cuh.
bdice Oct 25, 2021
327effb
Clean up use of utility functions.
bdice Oct 25, 2021
f2f6715
Add includes for column_device_view.
bdice Oct 26, 2021
665a692
Simplify dispatcher construction.
bdice Oct 26, 2021
bce1f65
Combine SFINAE templates into one method with if constexpr.
bdice Oct 26, 2021
aa14261
Use bitmask_and for MD5 null mask.
bdice Oct 26, 2021
adf2a23
Move base sha_hash function into anonymous namespace.
bdice Oct 26, 2021
b8cdbf6
Merge remote-tracking branch 'upstream/branch-22.04' into sha
bdice Mar 16, 2022
4e9ab2c
Merge remote-tracking branch 'upstream/branch-22.04' into sha
bdice Mar 18, 2022
98595e5
Merge remote-tracking branch 'upstream/branch-22.08' into sha
bdice May 23, 2022
ed20e83
Merge remote-tracking branch 'upstream/branch-22.08' into sha
bdice Jun 25, 2022
c6f6a82
Update copyright.
bdice Jun 25, 2022
3a8d4a6
Merge remote-tracking branch 'upstream/branch-22.08' into sha
bdice Jul 18, 2022
0d2e91f
Merge remote-tracking branch 'upstream/branch-22.12' into sha
bdice Nov 15, 2022
ae6c58b
Merge remote-tracking branch 'upstream/branch-23.02' into sha
bdice Nov 15, 2022
42beabd
Merge remote-tracking branch 'upstream/branch-23.02' into sha
bdice Nov 16, 2022
96cd76f
Use __device__ inline instead of CUDA_DEVICE_CALLABLE.
bdice Nov 17, 2022
3b2b618
Use null count from bitmask_and.
bdice Nov 17, 2022
706c702
Drop constexpr because is_fixed_width is no longer constexpr.
bdice Nov 17, 2022
9e44661
Use CUDF_TEST_EXPECT_COLUMNS_EQUAL.
bdice Nov 17, 2022
17591c7
Fix namespaces.
bdice Nov 17, 2022
00b172c
Remove default stream/mr.
bdice Nov 17, 2022
9689d92
Update includes.
bdice Nov 17, 2022
1dba6b3
Try compiling sha_hash.cu in CI without debug flag.
bdice Nov 17, 2022
f4e6aef
Merge branch 'branch-23.02' into sha
bdice Jan 21, 2023
65ee2b9
Merge remote-tracking branch 'upstream/branch-23.04' into sha
bdice Feb 1, 2023
dfa1dbb
Update copyright.
bdice Feb 1, 2023
b70fa8e
Merge remote-tracking branch 'upstream/branch-23.12' into sha
bdice Oct 17, 2023
143ae6a
Remove combined test file.
bdice Oct 17, 2023
68b23e0
Split into multiple TUs.
bdice Oct 17, 2023
9fd9e8d
Update tests.
bdice Oct 18, 2023
84f0527
Merge remote-tracking branch 'upstream/branch-23.12' into sha
bdice Oct 20, 2023
7598247
Merge remote-tracking branch 'upstream/branch-23.12' into sha
bdice Oct 31, 2023
80185b4
Merge remote-tracking branch 'upstream/branch-23.12' into sha
bdice Nov 9, 2023
1442bfd
Add alignment to avoid compiler bug.
bdice Nov 9, 2023
2cf9c1a
Merge branch 'branch-23.12' into sha
bdice Nov 9, 2023
47ccdd2
Remove extra copy of device function.
bdice Nov 9, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -347,6 +347,11 @@ add_library(
src/hash/md5_hash.cu
src/hash/murmurhash3_x86_32.cu
src/hash/murmurhash3_x64_128.cu
src/hash/sha1_hash.cu
src/hash/sha224_hash.cu
src/hash/sha256_hash.cu
src/hash/sha384_hash.cu
src/hash/sha512_hash.cu
src/hash/spark_murmurhash3_x86_32.cu
src/hash/xxhash_64.cu
src/interop/dlpack.cpp
Expand Down
40 changes: 39 additions & 1 deletion cpp/benchmarks/hashing/hash.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,36 @@ static void bench_hash(nvbench::state& state)

state.exec(nvbench::exec_tag::sync,
[&](nvbench::launch& launch) { auto result = cudf::hashing::md5(data->view()); });
} else if (hash_name == "sha1") {
// sha1 creates a 40-byte string
state.add_global_memory_writes<nvbench::int8_t>(40 * num_rows);

state.exec(nvbench::exec_tag::sync,
[&](nvbench::launch& launch) { auto result = cudf::hashing::sha1(data->view()); });
} else if (hash_name == "sha224") {
// sha224 creates a 56-byte string
state.add_global_memory_writes<nvbench::int8_t>(56 * num_rows);

state.exec(nvbench::exec_tag::sync,
[&](nvbench::launch& launch) { auto result = cudf::hashing::sha224(data->view()); });
} else if (hash_name == "sha256") {
// sha256 creates a 64-byte string
state.add_global_memory_writes<nvbench::int8_t>(64 * num_rows);

state.exec(nvbench::exec_tag::sync,
[&](nvbench::launch& launch) { auto result = cudf::hashing::sha256(data->view()); });
} else if (hash_name == "sha384") {
// sha384 creates a 96-byte string
state.add_global_memory_writes<nvbench::int8_t>(96 * num_rows);

state.exec(nvbench::exec_tag::sync,
[&](nvbench::launch& launch) { auto result = cudf::hashing::sha384(data->view()); });
} else if (hash_name == "sha512") {
// sha512 creates a 128-byte string
state.add_global_memory_writes<nvbench::int8_t>(128 * num_rows);

state.exec(nvbench::exec_tag::sync,
[&](nvbench::launch& launch) { auto result = cudf::hashing::sha512(data->view()); });
} else if (hash_name == "spark_murmurhash3_x86_32") {
state.add_global_memory_writes<nvbench::int32_t>(num_rows);

Expand All @@ -82,4 +112,12 @@ NVBENCH_BENCH(bench_hash)
.set_name("hashing")
.add_int64_axis("num_rows", {65536, 16777216})
.add_float64_axis("nulls", {0.0, 0.1})
.add_string_axis("hash_name", {"murmurhash3_x86_32", "md5", "spark_murmurhash3_x86_32"});
.add_string_axis("hash_name",
{"murmurhash3_x86_32",
"md5",
"sha1",
"sha224",
"sha256",
"sha384",
"sha512",
"spark_murmurhash3_x86_32"});
77 changes: 76 additions & 1 deletion cpp/include/cudf/hashing.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,12 @@ enum class hash_id {
HASH_IDENTITY = 0, ///< Identity hash function that simply returns the key to be hashed
HASH_MURMUR3, ///< Murmur3 hash function
HASH_SPARK_MURMUR3, ///< Spark Murmur3 hash function
HASH_MD5 ///< MD5 hash function
HASH_MD5, ///< MD5 hash function
HASH_SHA1, ///< SHA-1 hash function
HASH_SHA224, ///< SHA-224 hash function
HASH_SHA256, ///< SHA-256 hash function
HASH_SHA384, ///< SHA-384 hash function
HASH_SHA512 ///< SHA-512 hash function
};

/**
Expand Down Expand Up @@ -145,6 +150,76 @@ std::unique_ptr<column> md5(
rmm::cuda_stream_view stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
* @brief Computes the SHA-1 hash value of each row in the given table
*
* @param input The table of columns to hash
* @param stream CUDA stream used for device memory operations and kernel launches
* @param mr Device memory resource used to allocate the returned column's device memory
*
* @returns A column where each row is the hash of a row from the input
*/
std::unique_ptr<column> sha1(
table_view const& input,
rmm::cuda_stream_view stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
* @brief Computes the SHA-224 hash value of each row in the given table
*
* @param input The table of columns to hash
* @param stream CUDA stream used for device memory operations and kernel launches
* @param mr Device memory resource used to allocate the returned column's device memory
*
* @returns A column where each row is the hash of a row from the input
*/
std::unique_ptr<column> sha224(
table_view const& input,
rmm::cuda_stream_view stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
* @brief Computes the SHA-256 hash value of each row in the given table
*
* @param input The table of columns to hash
* @param stream CUDA stream used for device memory operations and kernel launches
* @param mr Device memory resource used to allocate the returned column's device memory
*
* @returns A column where each row is the hash of a row from the input
*/
std::unique_ptr<column> sha256(
table_view const& input,
rmm::cuda_stream_view stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
* @brief Computes the SHA-384 hash value of each row in the given table
*
* @param input The table of columns to hash
* @param stream CUDA stream used for device memory operations and kernel launches
* @param mr Device memory resource used to allocate the returned column's device memory
*
* @returns A column where each row is the hash of a row from the input
*/
std::unique_ptr<column> sha384(
table_view const& input,
rmm::cuda_stream_view stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
* @brief Computes the SHA-512 hash value of each row in the given table
*
* @param input The table of columns to hash
* @param stream CUDA stream used for device memory operations and kernel launches
* @param mr Device memory resource used to allocate the returned column's device memory
*
* @returns A column where each row is the hash of a row from the input
*/
std::unique_ptr<column> sha512(
table_view const& input,
rmm::cuda_stream_view stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
* @brief Computes the XXHash_64 hash value of each row in the given table
*
Expand Down
41 changes: 41 additions & 0 deletions cpp/include/cudf/hashing/detail/hash_functions.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -68,4 +68,45 @@ __device__ inline uint64_t rotate_bits_right(uint64_t x, uint32_t r)
return (x >> r) | (x << (64 - r));
}

// Swap the endianness of a 32 bit value
__device__ inline uint32_t swap_endian(uint32_t x)
{
// The selector 0x0123 reverses the byte order
return __byte_perm(x, 0, 0x0123);
}

// Swap the endianness of a 64 bit value
// There is no CUDA intrinsic for permuting bytes in 64 bit integers
__device__ inline uint64_t swap_endian(uint64_t x)
{
// Reverse the endianness of each 32 bit section
uint32_t low_bits = swap_endian(static_cast<uint32_t>(x));
uint32_t high_bits = swap_endian(static_cast<uint32_t>(x >> 32));
// Reassemble a 64 bit result, swapping the low bits and high bits
return (static_cast<uint64_t>(low_bits) << 32) | (static_cast<uint64_t>(high_bits));
};

/**
* Modified GPU implementation of
* https://johnnylee-sde.github.io/Fast-unsigned-integer-to-hex-string/
* Copyright (c) 2015 Barry Clark
* Licensed under the MIT license.
* See file LICENSE for detail or copy at https://opensource.org/licenses/MIT
*/
void __device__ inline uint32ToLowercaseHexString(uint32_t num, char* destination)
{
// Transform 0xABCD'1234 => 0x0000'ABCD'0000'1234 => 0x0B0A'0D0C'0201'0403
uint64_t x = num;
x = ((x & 0xFFFF'0000u) << 16) | ((x & 0xFFFF));
x = ((x & 0x000F'0000'000Fu) << 8) | ((x & 0x00F0'0000'00F0u) >> 4) |
((x & 0x0F00'0000'0F00u) << 16) | ((x & 0xF000'0000'F000) << 4);

// Calculate a mask of ascii value offsets for bytes that contain alphabetical hex digits
uint64_t offsets = (((x + 0x0606'0606'0606'0606) >> 4) & 0x0101'0101'0101'0101) * 0x27;

x |= 0x3030'3030'3030'3030;
x += offsets;
std::memcpy(destination, reinterpret_cast<uint8_t*>(&x), 8);
}

} // namespace cudf::hashing::detail
20 changes: 20 additions & 0 deletions cpp/include/cudf/hashing/detail/hashing.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,26 @@ std::unique_ptr<column> md5(table_view const& input,
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr);

std::unique_ptr<column> sha1(table_view const& input,
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr);

std::unique_ptr<column> sha224(table_view const& input,
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr);

std::unique_ptr<column> sha256(table_view const& input,
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr);

std::unique_ptr<column> sha384(table_view const& input,
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr);

std::unique_ptr<column> sha512(table_view const& input,
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr);

std::unique_ptr<column> xxhash_64(table_view const& input,
uint64_t seed,
rmm::cuda_stream_view,
Expand Down
5 changes: 5 additions & 0 deletions cpp/src/hash/hashing.cu
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,11 @@ std::unique_ptr<column> hash(table_view const& input,
case (hash_id::HASH_MURMUR3): return murmurhash3_x86_32(input, seed, stream, mr);
case (hash_id::HASH_SPARK_MURMUR3): return spark_murmurhash3_x86_32(input, seed, stream, mr);
case (hash_id::HASH_MD5): return md5(input, stream, mr);
case (hash_id::HASH_SHA1): return sha1(input, stream, mr);
case (hash_id::HASH_SHA224): return sha224(input, stream, mr);
case (hash_id::HASH_SHA256): return sha256(input, stream, mr);
case (hash_id::HASH_SHA384): return sha384(input, stream, mr);
case (hash_id::HASH_SHA512): return sha512(input, stream, mr);
default: CUDF_FAIL("Unsupported hash function.");
}
}
Expand Down
23 changes: 0 additions & 23 deletions cpp/src/hash/md5_hash.cu
Original file line number Diff line number Diff line change
Expand Up @@ -108,29 +108,6 @@ auto __device__ inline get_element_pointer_and_size(string_view const& element)
return thrust::make_pair(reinterpret_cast<uint8_t const*>(element.data()), element.size_bytes());
}

/**
* Modified GPU implementation of
* https://johnnylee-sde.github.io/Fast-unsigned-integer-to-hex-string/
* Copyright (c) 2015 Barry Clark
* Licensed under the MIT license.
* See file LICENSE for detail or copy at https://opensource.org/licenses/MIT
*/
void __device__ inline uint32ToLowercaseHexString(uint32_t num, char* destination)
{
// Transform 0xABCD'1234 => 0x0000'ABCD'0000'1234 => 0x0B0A'0D0C'0201'0403
uint64_t x = num;
x = ((x & 0xFFFF'0000u) << 16) | ((x & 0xFFFF));
x = ((x & 0x000F'0000'000Fu) << 8) | ((x & 0x00F0'0000'00F0u) >> 4) |
((x & 0x0F00'0000'0F00u) << 16) | ((x & 0xF000'0000'F000) << 4);

// Calculate a mask of ascii value offsets for bytes that contain alphabetical hex digits
uint64_t offsets = (((x + 0x0606'0606'0606'0606) >> 4) & 0x0101'0101'0101'0101) * 0x27;

x |= 0x3030'3030'3030'3030;
x += offsets;
std::memcpy(destination, reinterpret_cast<uint8_t*>(&x), 8);
}

// The MD5 algorithm and its hash/shift constants are officially specified in
// RFC 1321. For convenience, these values can also be found on Wikipedia:
// https://en.wikipedia.org/wiki/MD5
Expand Down
100 changes: 100 additions & 0 deletions cpp/src/hash/sha1_hash.cu
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
/*
* Copyright (c) 2022-2023, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

#include "sha_hash.cuh"

#include <cudf/column/column_device_view.cuh>
#include <cudf/column/column_factories.hpp>
#include <cudf/detail/iterator.cuh>
#include <cudf/detail/null_mask.hpp>
#include <cudf/detail/nvtx/ranges.hpp>
#include <cudf/hashing/detail/hash_functions.cuh>
#include <cudf/hashing/detail/hashing.hpp>
#include <cudf/scalar/scalar.hpp>
#include <cudf/strings/detail/strings_children.cuh>
#include <cudf/strings/string_view.hpp>
#include <cudf/table/table_device_view.cuh>
#include <cudf/utilities/traits.hpp>

#include <rmm/cuda_stream_view.hpp>
#include <rmm/exec_policy.hpp>

#include <thrust/execution_policy.h>
#include <thrust/fill.h>
#include <thrust/for_each.h>
#include <thrust/iterator/constant_iterator.h>
#include <thrust/iterator/counting_iterator.h>

#include <algorithm>
#include <limits>
#include <memory>
#include <type_traits>
#include <utility>

namespace cudf {
namespace hashing {
namespace detail {

namespace {

struct sha1_hash_state {
uint64_t message_length = 0;
uint32_t buffer_length = 0;
uint32_t hash_value[5] = {0x67452301, 0xefcdab89, 0x98badcfe, 0x10325476, 0xc3d2e1f0};
uint8_t buffer[64];
};

struct SHA1Hash : HashBase<SHA1Hash> {
__device__ inline SHA1Hash(char* result_location) : HashBase<SHA1Hash>(result_location) {}

// Intermediate data type storing the hash state
using hash_state = sha1_hash_state;
// The word type used by this hash function
using sha_word_type = uint32_t;
// Number of bytes processed in each hash step
static constexpr uint32_t message_chunk_size = 64;
// Digest size in bytes
static constexpr uint32_t digest_size = 40;
// Number of bytes used for the message length
static constexpr uint32_t message_length_size = 8;

void __device__ inline hash_step(hash_state& state) { sha1_hash_step(state); }

hash_state state;
};

} // namespace

std::unique_ptr<column> sha1(table_view const& input,
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr)
{
string_scalar const empty_result("da39a3ee5e6b4b0d3255bfef95601890afd80709");
return sha_hash<SHA1Hash>(input, empty_result, stream, mr);
}

} // namespace detail

std::unique_ptr<column> sha1(table_view const& input,
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr)
{
CUDF_FUNC_RANGE();
return detail::sha1(input, stream, mr);
}

} // namespace hashing
} // namespace cudf
Loading
Loading