Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new nvtext minhash_permuted API #16756

Merged
merged 101 commits into from
Nov 12, 2024
Merged
Changes from 1 commit
Commits
Show all changes
101 commits
Select commit Hold shift + click to select a range
7f38f21
Improve minhash performance by using more working memory
davidwendt Sep 5, 2024
a653542
Merge branch 'branch-24.10' into perf-minhash-highmem
davidwendt Sep 5, 2024
99f151e
Merge branch 'branch-24.10' into perf-minhash-highmem
davidwendt Sep 9, 2024
76c0367
fix merge conflict
davidwendt Sep 17, 2024
f5e24ac
Merge branch 'branch-24.10' into perf-minhash-highmem
davidwendt Sep 18, 2024
f81b109
Merge branch 'branch-24.10' into perf-minhash-highmem
davidwendt Sep 18, 2024
9700272
Merge branch 'branch-24.10' into perf-minhash-highmem
davidwendt Sep 19, 2024
f35c16d
change to block per string
davidwendt Sep 19, 2024
6dc19ef
Merge branch 'branch-24.10' into perf-minhash-highmem
davidwendt Sep 19, 2024
01500dd
fix sync call
davidwendt Sep 20, 2024
fcac398
Merge branch 'branch-24.10' into perf-minhash-highmem
davidwendt Sep 20, 2024
e700df2
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Sep 23, 2024
d1c0b85
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Sep 24, 2024
1fef924
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Sep 26, 2024
d611acb
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Sep 26, 2024
2fe7153
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Sep 27, 2024
84f248e
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Sep 27, 2024
c362916
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Sep 30, 2024
117467e
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Sep 30, 2024
6e1bfff
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Oct 1, 2024
70948b9
fix benchmark ranges
davidwendt Oct 2, 2024
f329f84
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Oct 2, 2024
fe565f1
Merge branch 'perf-minhash-highmem' of github.com:davidwendt/cudf int…
davidwendt Oct 2, 2024
81a16be
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Oct 2, 2024
ef3b228
minor fixes
davidwendt Oct 2, 2024
1753a40
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Oct 2, 2024
f4181f7
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Oct 2, 2024
b023aa5
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Oct 3, 2024
03d570d
minor cleanups
davidwendt Oct 3, 2024
3f5f5b5
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Oct 3, 2024
07ddef7
Merge branch 'perf-minhash-highmem' of github.com:davidwendt/cudf int…
davidwendt Oct 3, 2024
c4ff137
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Oct 3, 2024
121419c
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Oct 4, 2024
b1363ee
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Oct 7, 2024
f747163
match benchmark to curator parameters
davidwendt Oct 10, 2024
72acce3
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Oct 10, 2024
aa6f3e0
add minhash_permuted API
davidwendt Oct 10, 2024
23a87ed
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Oct 10, 2024
660641e
Merge branch 'perf-minhash-highmem' of github.com:davidwendt/cudf int…
davidwendt Oct 11, 2024
186477e
revert benchmark API call
davidwendt Oct 11, 2024
24c8073
Merge branch 'perf-minhash-highmem' of github.com:davidwendt/cudf int…
davidwendt Oct 11, 2024
38be18b
fix merge conflict
davidwendt Oct 11, 2024
b49950d
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Oct 14, 2024
a7583d0
experimental single-hash permutation
davidwendt Oct 14, 2024
ff18693
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Oct 14, 2024
00e2bee
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Oct 14, 2024
08e6400
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Oct 15, 2024
42e7d7b
enable seed-hash temporary memory
davidwendt Oct 15, 2024
55245ca
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Oct 15, 2024
8d202dd
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Oct 16, 2024
f79631a
dynamic shared memory to static
davidwendt Oct 16, 2024
8df5acf
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Oct 16, 2024
afe3ade
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Oct 16, 2024
a0816b9
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Oct 17, 2024
fda43cc
cleanup variable names, doxygen
davidwendt Oct 17, 2024
97395c8
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Oct 17, 2024
ce90455
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Oct 18, 2024
0f83584
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Oct 18, 2024
d1e3154
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Oct 21, 2024
1f4d441
support for super-wide strings
davidwendt Oct 22, 2024
9a583dc
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Oct 22, 2024
bf6413f
Merge branch 'perf-minhash-highmem' of github.com:davidwendt/cudf int…
davidwendt Oct 23, 2024
d83e9db
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Oct 23, 2024
58f9206
fix threshold-index init logic
davidwendt Oct 23, 2024
d2abafd
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Oct 23, 2024
fef6e0e
use cudf::detail::device_scalar
davidwendt Oct 23, 2024
bc33896
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Oct 23, 2024
e1067b3
fix benchmarks; add gtest
davidwendt Oct 24, 2024
e5744a5
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Oct 24, 2024
767e163
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Oct 25, 2024
41427fd
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Oct 25, 2024
081546f
reinstate original non-permuted code
davidwendt Oct 25, 2024
c4886a4
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Oct 28, 2024
ef7bb46
add gtest for parameter chunking
davidwendt Oct 28, 2024
2a9928a
move pytests to use permuted api
davidwendt Oct 28, 2024
a17f336
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Oct 28, 2024
a4bce0f
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Oct 28, 2024
9548eb6
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Oct 28, 2024
afb173b
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Oct 29, 2024
a39b738
add docstring for new APIs
davidwendt Oct 29, 2024
dfcd3e6
Merge branch 'perf-minhash-highmem' of github.com:davidwendt/cudf int…
davidwendt Oct 29, 2024
d836f34
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Oct 29, 2024
43541e4
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Oct 31, 2024
b66599c
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Oct 31, 2024
ad4c031
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Oct 31, 2024
ad411a8
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Oct 31, 2024
47cf9e4
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Nov 5, 2024
c5122b3
Merge branch 'perf-minhash-highmem' of github.com:davidwendt/cudf int…
davidwendt Nov 5, 2024
7ade810
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Nov 5, 2024
186befd
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Nov 6, 2024
1989c62
change test_minhash to test_minhash_permuted
davidwendt Nov 6, 2024
7fea20f
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Nov 6, 2024
2804015
add deprecation warnings
davidwendt Nov 6, 2024
99557c7
Merge branch 'perf-minhash-highmem' of github.com:davidwendt/cudf int…
davidwendt Nov 7, 2024
494237d
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Nov 7, 2024
7446d53
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Nov 7, 2024
1ed78a4
change DeprecationWarning to FutureWarning
davidwendt Nov 7, 2024
513218a
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Nov 7, 2024
4e3e25d
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Nov 8, 2024
056eb79
fix shared-memory variable type
davidwendt Nov 8, 2024
8c4e6dc
Merge branch 'branch-24.12' into perf-minhash-highmem
davidwendt Nov 8, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
fix merge conflict
davidwendt committed Sep 17, 2024
commit 76c03676f1c52b445c8b47ffda905753f73fdbdc
5 changes: 3 additions & 2 deletions cpp/src/text/minhash.cu
Original file line number Diff line number Diff line change
@@ -159,13 +159,14 @@ std::unique_ptr<cudf::column> minhash_fn(cudf::strings_column_view const& input,
mr);
auto d_hashes = hashes->mutable_view().data<hash_value_type>();

constexpr int block_size = 256;
constexpr cudf::thread_index_type block_size = 256;

auto const wm_size = cudf::util::round_up_safe(
seeds.size() * cudf::detail::warp_size * input.size(), static_cast<std::size_t>(block_size));
auto working_memory = rmm::device_uvector<hash_value_type>(wm_size, stream);

cudf::detail::grid_1d grid{input.size() * cudf::detail::warp_size, block_size};
cudf::detail::grid_1d grid{
static_cast<cudf::thread_index_type>(input.size()) * cudf::detail::warp_size, block_size};
minhash_kernel<HashFunction><<<grid.num_blocks, grid.num_threads_per_block, 0, stream.value()>>>(
*d_strings, seeds, width, working_memory.data(), d_hashes);

You are viewing a condensed version of this merge commit. You can view the full changes here.