Add set retrieve #442

PointKernel · 2024-02-25T00:11:31Z

This PR adds host-bulk set retrieve APIs. For now, they use device find APIs to get matches since the benefit of creating a dedicated device retrieve is unclear.

It also adds a placeholder for an overload of retrieve that takes custom key_equal and hasher.

include/cuco/detail/static_set/kernels.cuh

sleeepyjack · 2024-03-12T23:17:41Z

include/cuco/detail/static_set/kernels.cuh

+
+  auto constexpr flushing_tile_size = cuco::detail::warp_size() / window_size;
+  // random choice to tune
+  auto constexpr flushing_buffer_size = 2 * flushing_tile_size;


I'm curious. Why did you choose that particular size?

No particular reason. Tested with 1, 2, 3 and 4 and there is no big difference between those options.

include/cuco/detail/static_set/kernels.cuh

include/cuco/detail/static_set/static_set.inl

include/cuco/detail/static_set/kernels.cuh

wence- · 2024-03-13T10:57:03Z

include/cuco/detail/static_set/kernels.cuh

+      auto const found = ref.find(tile, *(first + idx));
+#if defined(CUCO_HAS_CG_INVOKE_ONE)
+      if (found != ref.end()) {
+        cg::invoke_one(tile, [&]() {


question: invoke_one is logically collective over the group defined by tile and the hardware could select any thread in [0, tile.num_threads()) to execute the functor. However, it seems to me that not all threads in tile could reach this line (because both found and active_flag are divergent to my understanding). Is this a problem?

Thanks for explaining your concern. The tile-based ref.find(tile, ...) guarantees that all threads of the same tile have the same found. active_flag could diverge between different tiles but not for threads of the same tile.

wence- · 2024-03-13T11:04:45Z

include/cuco/detail/static_set/kernels.cuh

+  __shared__ Size offset;
+
+#if defined(CUCO_HAS_CG_INVOKE_ONE)
+  cooperative_groups::invoke_one(
+    block, [&]() { offset = counter->fetch_add(buffer_size, cuda::std::memory_order_relaxed); });
+#else
+  if (i == 0) { offset = counter->fetch_add(buffer_size, cuda::std::memory_order_relaxed); }
+#endif
+  block.sync();


question: In the CG invoke_one case, is this better written without the explicit __shared__ offset as:

#if defined(CUCO_HAS_CG_INVOKE_ONE) Size offset = cg::invoke_one_broadcast(block, [&] { return counter->fetch_add(buffer_size, cuda::std::memory_order_relaxed) }); #else __shared__ Size offset; if (i == 0) { offset = counter->fetch_add(buffer_size, cuda::std::memory_order_relaxed); } block.sync() #endif

?

I see your point. cg::invoke_one_broadcast only works for tiles but not thread block thus it doesn't work in this particular case. However, your suggestion is valid for numerous other cases in cuco and I will make a PR to update them all. 👍 Love it.

sleeepyjack · 2024-03-18T23:46:50Z

include/cuco/static_set.cuh

+   *
+   * @note Behavior is undefined if the size of the output range exceeds
+   * `std::distance(output_begin, output_end)`.
+   * @note Behavior is undefined if the given key has multiple matches in the set.


Is it undefined or do we return the first matching occurrence of the key?

It's the first element for scalar probing but undefined behavior for CG-based algorithms so undefined behavior is accurate.

sleeepyjack · 2024-03-18T23:48:33Z

include/cuco/detail/static_set/static_set.inl

+  ProbeHash const& probe_hash,
+  cuda_stream_ref stream) const
+{
+  CUCO_FAIL("Unsupported code path: retrieve_async with custom hash/equal");


We should add a note about this in the inline docs

sleeepyjack

Awesome work! Thanks!

PointKernel added 5 commits February 16, 2024 12:07

Add set retrieve APIs

5e76303

Add retrieve overload that takes probe equal and hasher

14330f1

Add cuco retrieve operator

8994fad

Minor fix

e9413f0

Add set retrieve kernel

88b474d

PointKernel added type: feature request New feature request helps: rapids Helps or needed by RAPIDS topic: static_set Issue related to the static_set labels Feb 25, 2024

PointKernel added 4 commits February 24, 2024 16:16

Minor cleanups

cf4f975

Add scalar retrieve kernel and tests

ba9e787

Forgot to git add new file -_-||

fee4c40

Update copyright year

e5aa543

wence- reviewed Feb 28, 2024

View reviewed changes

include/cuco/detail/static_set/kernels.cuh Outdated Show resolved Hide resolved

GregoryKimball mentioned this pull request Feb 29, 2024

[FEA] Add distinct-key joins to libcudf rapidsai/cudf#14948

Closed

6 tasks

PointKernel added the In Progress Currently a work in progress label Mar 2, 2024

PointKernel added 5 commits March 8, 2024 10:10

Add CG one broadcast config macro

6244695

Add group retrieve kernel + tests

7d95b71

Merge remote-tracking branch 'upstream/dev' into add-set-retrieve

1b136ba

Minor updates

47526fe

Revert retrieve operator

4d3c3ad

PointKernel marked this pull request as ready for review March 12, 2024 18:44

PointKernel requested a review from sleeepyjack as a code owner March 12, 2024 18:44

PointKernel added Needs Review Awaiting reviews before merging and removed In Progress Currently a work in progress labels Mar 12, 2024

sleeepyjack reviewed Mar 12, 2024

View reviewed changes

wence- reviewed Mar 13, 2024

View reviewed changes

PointKernel added 4 commits March 14, 2024 14:34

Replace std::pair with cuda::std::pair

7178ebb

Minor cleanups

9855f2a

Avoid explicit type spelling

5f909fb

Use placement new

0b398b4

PointKernel added 2 commits March 14, 2024 15:22

Address review comments

5c273f7

Minor cleanups

f84eae5

PointKernel requested a review from sleeepyjack March 14, 2024 23:01

sleeepyjack reviewed Mar 18, 2024

View reviewed changes

PointKernel added 2 commits March 18, 2024 18:52

Merge remote-tracking branch 'upstream/dev' into add-set-retrieve

076b069

Update docs

1cb1b58

sleeepyjack approved these changes Mar 19, 2024

View reviewed changes

PointKernel merged commit dd51a21 into NVIDIA:dev Mar 19, 2024
15 checks passed

PointKernel deleted the add-set-retrieve branch March 19, 2024 16:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add set retrieve #442

Add set retrieve #442

PointKernel commented Feb 25, 2024 •

edited

Loading

sleeepyjack Mar 12, 2024

PointKernel Mar 14, 2024

wence- Mar 13, 2024

PointKernel Mar 14, 2024

wence- Mar 18, 2024

wence- Mar 13, 2024

PointKernel Mar 14, 2024 •

edited

Loading

sleeepyjack Mar 18, 2024

PointKernel Mar 19, 2024

sleeepyjack Mar 18, 2024

PointKernel Mar 19, 2024

sleeepyjack left a comment

Add set retrieve #442

Add set retrieve #442

Conversation

PointKernel commented Feb 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PointKernel Mar 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sleeepyjack left a comment

Choose a reason for hiding this comment

PointKernel commented Feb 25, 2024 •

edited

Loading

PointKernel Mar 14, 2024 •

edited

Loading