Fix memcheck error found in STRINGS_TEST #13578

davidwendt · 2023-06-14T19:04:36Z

Description

Fixes a memcheck error found in STRINGS_TEST where an atomicOr was used on a boolean device scalar. The workaround uses a cub::WarpReduce to compute the result in the warp-per-string kernel.

Reference #13574

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

cpp/src/strings/search/find.cu

bdice · 2023-06-14T19:08:20Z

Does this change have performance implications?

davidwendt · 2023-06-14T19:14:57Z

Does this change have performance implications?

Nothing noticeable. I just thought the atomicOr code looked cleaner with the thrust::for_each taking care of all the kernel launch details. Maybe @PointKernel could recommend the cuda::atomic_ref equivalent? I had this working before he mentioned replacing atomicOr.

ttnghia · 2023-06-14T19:44:28Z

cpp/src/strings/search/find.cu

-                       contains_warp_fn{*d_strings, d_target, results_view.data<bool>()});
+    auto const d_strings     = column_device_view::create(input.parent(), stream);
+    constexpr int block_size = 256;
+    cudf::detail::grid_1d grid{input.size() * cudf::detail::warp_size, block_size};


What if input.size() * cudf::detail::warp_size overflow? grid_1d only has int members.

Oh I just found the similar code in other files (attributes.cu and find.cu). So this may be a new potential issue.

You missed this line of code perhaps?
https://github.com/rapidsai/cudf/pull/13578/files#diff-048f86c21559b14f64f86aaeaa57776d366c3a4948a5aba7c0ab1a3801be87bcR292

if (idx >= (d_strings.size() * cudf::detail::warp_size)) { return; }

That did not format too well. It is line 292 currently.
This line is in attributes.cu as well.

Wait, that line is inside the kernel, while this line is before kernel launch. If we have overflow here, we may still launch a kernel with some (large?) input. We should avoid launching the kernel from here instead.

An overflow cannot technically occur here since this code path is only for long strings which is always much greater than 32 bytes on average. This means the number of rows * 32 will never overflow under these conditions.

ttnghia · 2023-06-14T20:25:30Z

cpp/src/strings/search/find.cu

-      if (d_target.compare(d_str.data() + i, d_target.size_bytes()) == 0) { found = true; }
-    }
-    if (found) { atomicOr(d_results + str_idx, true); }
+  if (idx >= (d_strings.size() * cudf::detail::warp_size)) { return; }


Again I suspect that this mul can overflow, as all the operands are of type int. So maybe we should cast into int64_t?

Suggested change

if (idx >= (d_strings.size() * cudf::detail::warp_size)) { return; }

if (static_cast<int64_t>(idx) >= static_cast<...>(d_strings.size()) * static_cast<...>(cudf::detail::warp_size)) { return; }

An overflow cannot technically occur here since this code path is only for long strings which is always much greater than 32 bytes on average. This means the (number of rows * 32) will never overflow under these conditions.

PointKernel · 2023-06-14T20:26:54Z

I was inclined to use atomic_ref for this issue and now realize the warp-reduce workaround is a better solution: atomic_ref supports exclusively 4-byte or 8-byte types since only 32-bit and 64-bit atomic CAS are supported at the hardware level (plus 128-bit on hopper). Technically, we should never do atomic operations over bools.

bdice · 2023-06-21T20:05:32Z

cpp/src/strings/search/find.cu

  }
-};
+  auto const result = warp_reduce(temp_storage).Reduce(found, cub::Max());


Wondering if we could get an early-exit benefit by checking the warp-reduced result before reading the full string. (Discussed offline with @davidwendt.) I don't have a good expectation for the synchronization cost of a single warp sync. It'll probably be slower, but I'd like to learn by how much.

davidwendt · 2023-06-22T12:26:42Z

/merge

Contributes to #13575 Depends on #13574, #13578 This PR cleans up custom atomic implementations in libcudf by using `cuda::atomic_ref` when possible. It removes atomic bitwise operations like `and`, `or` and `xor` since libcudac++ already provides proper replacements. Authors: - Yunsong Wang (https://github.com/PointKernel) Approvers: - Bradley Dice (https://github.com/bdice) - David Wendt (https://github.com/davidwendt) URL: #13583

Fix memcheck error found in STRINGS_TEST

f310872

davidwendt added bug Something isn't working 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. strings strings issues (C++ and Python) non-breaking Non-breaking change labels Jun 14, 2023

davidwendt self-assigned this Jun 14, 2023

davidwendt requested a review from a team as a code owner June 14, 2023 19:04

davidwendt requested review from ttnghia and divyegala June 14, 2023 19:04

bdice reviewed Jun 14, 2023

View reviewed changes

cpp/src/strings/search/find.cu Outdated Show resolved Hide resolved

remove commented out code

03808d0

ttnghia reviewed Jun 14, 2023

View reviewed changes

PointKernel mentioned this pull request Jun 14, 2023

Clean up cudf device atomic with cuda::atomic_ref #13583

Merged

3 tasks

Merge branch 'branch-23.08' into bug-strings-find-memcheck

8c6a5fe

davidwendt requested a review from bdice June 20, 2023 20:11

davidwendt changed the title ~~Fix memcheck error found in STRINGS_TEST~~ Fix memcheck error found in STRINGS_TEST Jun 21, 2023

bdice approved these changes Jun 21, 2023

View reviewed changes

Merge branch 'branch-23.08' into bug-strings-find-memcheck

5afc51b

ttnghia approved these changes Jun 21, 2023

View reviewed changes

rapids-bot bot merged commit 7cbef2a into rapidsai:branch-23.08 Jun 22, 2023

davidwendt deleted the bug-strings-find-memcheck branch June 22, 2023 12:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix memcheck error found in STRINGS_TEST #13578

Fix memcheck error found in STRINGS_TEST #13578

davidwendt commented Jun 14, 2023

bdice commented Jun 14, 2023

davidwendt commented Jun 14, 2023

ttnghia Jun 14, 2023

ttnghia Jun 14, 2023

davidwendt Jun 14, 2023 •

edited

Loading

ttnghia Jun 14, 2023 •

edited

Loading

davidwendt Jun 15, 2023

ttnghia Jun 14, 2023 •

edited

Loading

davidwendt Jun 20, 2023

PointKernel commented Jun 14, 2023

bdice Jun 21, 2023

davidwendt commented Jun 22, 2023

	if (idx >= (d_strings.size() * cudf::detail::warp_size)) { return; }
	if (static_cast<int64_t>(idx) >= static_cast<...>(d_strings.size()) * static_cast<...>(cudf::detail::warp_size)) { return; }

Fix memcheck error found in STRINGS_TEST #13578

Fix memcheck error found in STRINGS_TEST #13578

Conversation

davidwendt commented Jun 14, 2023

Description

Checklist

bdice commented Jun 14, 2023

davidwendt commented Jun 14, 2023

ttnghia Jun 14, 2023

Choose a reason for hiding this comment

ttnghia Jun 14, 2023

Choose a reason for hiding this comment

davidwendt Jun 14, 2023 • edited Loading

Choose a reason for hiding this comment

ttnghia Jun 14, 2023 • edited Loading

Choose a reason for hiding this comment

davidwendt Jun 15, 2023

Choose a reason for hiding this comment

ttnghia Jun 14, 2023 • edited Loading

Choose a reason for hiding this comment

davidwendt Jun 20, 2023

Choose a reason for hiding this comment

PointKernel commented Jun 14, 2023

bdice Jun 21, 2023

Choose a reason for hiding this comment

davidwendt commented Jun 22, 2023

davidwendt Jun 14, 2023 •

edited

Loading

ttnghia Jun 14, 2023 •

edited

Loading

ttnghia Jun 14, 2023 •

edited

Loading