-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix target counting in strings char-parallel replace #16017
Fix target counting in strings char-parallel replace #16017
Conversation
No change in performance according to the benchmark
|
what's the issue with count_if? |
The |
cpp/src/strings/replace/replace.cu
Outdated
auto const total = block_reduce(temp_storage).Reduce(count, cub::Sum()); | ||
|
||
if ((lane_idx == 0) && (total > 0)) { | ||
cuda::atomic_ref<int64_t, cuda::thread_scope_block> ref{*d_output}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of thread_scope_block
here is surprising to me. Aren't threads from different blocks updating the same output here? I'd expect that thread_scope_device
is required.
I'm probably missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds right to me. Fixed in 530de96
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
/merge |
Description
Replace
thrust::count_if
call across int64 characters to use a custom kernel instead.Checklist