-
Notifications
You must be signed in to change notification settings - Fork 912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix memcheck error found in STRINGS_TEST #13578
Fix memcheck error found in STRINGS_TEST #13578
Conversation
Does this change have performance implications? |
Nothing noticeable. I just thought the |
contains_warp_fn{*d_strings, d_target, results_view.data<bool>()}); | ||
auto const d_strings = column_device_view::create(input.parent(), stream); | ||
constexpr int block_size = 256; | ||
cudf::detail::grid_1d grid{input.size() * cudf::detail::warp_size, block_size}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if input.size() * cudf::detail::warp_size
overflow? grid_1d
only has int
members.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I just found the similar code in other files (attributes.cu
and find.cu
). So this may be a new potential issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You missed this line of code perhaps?
https://github.com/rapidsai/cudf/pull/13578/files#diff-048f86c21559b14f64f86aaeaa57776d366c3a4948a5aba7c0ab1a3801be87bcR292
if (idx >= (d_strings.size() * cudf::detail::warp_size)) { return; }
That did not format too well. It is line 292 currently.
This line is in attributes.cu
as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, that line is inside the kernel, while this line is before kernel launch. If we have overflow here, we may still launch a kernel with some (large?) input. We should avoid launching the kernel from here instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An overflow cannot technically occur here since this code path is only for long strings which is always much greater than 32 bytes on average. This means the number of rows * 32 will never overflow under these conditions.
if (d_target.compare(d_str.data() + i, d_target.size_bytes()) == 0) { found = true; } | ||
} | ||
if (found) { atomicOr(d_results + str_idx, true); } | ||
if (idx >= (d_strings.size() * cudf::detail::warp_size)) { return; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again I suspect that this mul can overflow, as all the operands are of type int
. So maybe we should cast into int64_t
?
if (idx >= (d_strings.size() * cudf::detail::warp_size)) { return; } | |
if (static_cast<int64_t>(idx) >= static_cast<...>(d_strings.size()) * static_cast<...>(cudf::detail::warp_size)) { return; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An overflow cannot technically occur here since this code path is only for long strings which is always much greater than 32 bytes on average. This means the (number of rows * 32) will never overflow under these conditions.
I was inclined to use |
} | ||
}; | ||
auto const result = warp_reduce(temp_storage).Reduce(found, cub::Max()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if we could get an early-exit benefit by checking the warp-reduced result before reading the full string. (Discussed offline with @davidwendt.) I don't have a good expectation for the synchronization cost of a single warp sync. It'll probably be slower, but I'd like to learn by how much.
/merge |
Contributes to #13575 Depends on #13574, #13578 This PR cleans up custom atomic implementations in libcudf by using `cuda::atomic_ref` when possible. It removes atomic bitwise operations like `and`, `or` and `xor` since libcudac++ already provides proper replacements. Authors: - Yunsong Wang (https://github.com/PointKernel) Approvers: - Bradley Dice (https://github.com/bdice) - David Wendt (https://github.com/davidwendt) URL: #13583
Description
Fixes a memcheck error found in
STRINGS_TEST
where anatomicOr
was used on a boolean device scalar. The workaround uses acub::WarpReduce
to compute the result in the warp-per-string kernel.Reference #13574
Checklist