Skip to content

Commit

Permalink
Fix regex out-of-bounds write in strided rows logic (#11797)
Browse files Browse the repository at this point in the history
Fixes an out-of-bounds write error when a large number of strings requires a strided loop to meet an internal memory maximum. For row sizes that do not require strided loops, the row index never exceeds the size of the column preventing any out-of-bounds access. For large row counts, the CUDA `thread index` may be larger than the minimal count used for building the working-memory buffer. Since the kernel is launched with a thread-count with a specific block size, extra threads past the end of the minimal count are necessary to fill out the last block. These threads never contribute to the overall result but will attempt to access past the end of the working memory. Writing to this memory may corrupt memory for another kernel launched in parallel from another CPU thread. This change adds logic to prevent the extra threads from doing any work.

Fixes #11768

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - MithunR (https://github.com/mythrocks)
  - Nghia Truong (https://github.com/ttnghia)
  - Mike Wilson (https://github.com/hyperbolic2346)

URL: #11797
  • Loading branch information
davidwendt authored Sep 28, 2022
1 parent 5a416a0 commit da04725
Showing 1 changed file with 8 additions and 4 deletions.
12 changes: 8 additions & 4 deletions cpp/src/strings/regex/utilities.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -44,8 +44,10 @@ __global__ void for_each_kernel(ForEachFunction fn, reprog_device const d_prog,

auto const thread_idx = threadIdx.x + blockIdx.x * blockDim.x;
auto const stride = s_prog.thread_count();
for (auto idx = thread_idx; idx < size; idx += stride) {
fn(idx, s_prog, thread_idx);
if (thread_idx < stride) {
for (auto idx = thread_idx; idx < size; idx += stride) {
fn(idx, s_prog, thread_idx);
}
}
}

Expand Down Expand Up @@ -79,8 +81,10 @@ __global__ void transform_kernel(TransformFunction fn,

auto const thread_idx = threadIdx.x + blockIdx.x * blockDim.x;
auto const stride = s_prog.thread_count();
for (auto idx = thread_idx; idx < size; idx += stride) {
d_output[idx] = fn(idx, s_prog, thread_idx);
if (thread_idx < stride) {
for (auto idx = thread_idx; idx < size; idx += stride) {
d_output[idx] = fn(idx, s_prog, thread_idx);
}
}
}

Expand Down

0 comments on commit da04725

Please sign in to comment.