-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use thread_index_type
to avoid index overflow in grid-stride loops
#13895
Use thread_index_type
to avoid index overflow in grid-stride loops
#13895
Conversation
Question for reviewers: I don't think we have tests exercising an input that could cause overflow. I hesitated to add those tests since it requires relatively large memory and may take a long time to execute. What do you think? |
cpp/src/bitmask/null_mask.cu
Outdated
thread_index_type const tid = threadIdx.x + blockIdx.x * blockDim.x; | ||
thread_index_type const stride = blockDim.x * gridDim.x; | ||
thread_index_type thread_word_index = tid + first_word_index; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably we also need to upgrade first_
and last_
indices above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can reply on implicit conversions for <
or <=
since they are always cast to the larger data type for comparison. first_word_index
and last_word_index
are explicitly used as size_type
later in the loop thus setting them to thread_index_type
seems not worth the effort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I understand, but I'd better confirm:
first_word_index
andlast_word_index
areint32_t
.tid
isuint32_t
. Sothread_word_index
is eventuallyuint32_t
.while (thread_word_idx <= last_word_index)
compares the two asuint32_t
.- The place where
thread_word_index
might overflow is at line 275 (thread_word_index += stride
). But boththread_word_index
andstride
areuint32_t
already.
Yeah, it looks like this should work.
The for-loop comparison happens against shorter-width type |
It should be possible if the input size is smaller than |
Benchmarks could be a good place for very large sizes (not for testing, but to catch illegal accesses). But Only unit tests are run through memcheck regularly. So, it won't be caught regularly unless it's added in unit tests. Besides, memcheck will be very slow to run on very large inputs. Almost all of our algorithms aren't tested for INT32 max sizes. So, it may be okay to not add unit tests for this. |
I wrote a test as below and realized bitmask is a special case: the loop end condition is no smaller than the number of bitmask words which is TEST_F(SetBitmaskTest, index_overflow)
{
auto const begin = 0;
auto const end = INT_MAX - 10;
auto const valid = true;
auto const size = end - begin;
thrust::host_vector<bool> expected(size, valid);
rmm::device_buffer mask = create_null_mask(size, cudf::mask_state::UNINITIALIZED);
auto bitmask = static_cast<cudf::bitmask_type*>(mask.data());
cudf::set_null_mask(bitmask, begin, end, valid);
auto stream = cudf::get_default_stream();
rmm::device_uvector<bool> output(size, stream);
auto counting_iter = thrust::counting_iterator<cudf::size_type>{0};
thrust::transform(rmm::exec_policy(stream),
counting_iter,
counting_iter + size,
output.begin(),
valid_bit_functor{bitmask});
auto const result = thrust::all_of(
rmm::exec_policy(stream), output.begin(), output.end(), thrust::identity<bool>{});
EXPECT_EQ(result, valid);
} Not sure if it's still relevant, to answer your question about runtime, the test took about 700 ms to run. (base) yunsongw@yunsongw-dt:~/dev/rapids/cudf/cpp/build/release/gtests$ ./BITMASK_TEST --gtest_filter=SetBitmaskTest.index_overflow
Note: Google Test filter = SetBitmaskTest.index_overflow
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from SetBitmaskTest
[ RUN ] SetBitmaskTest.index_overflow
[ OK ] SetBitmaskTest.index_overflow (708 ms)
[----------] 1 test from SetBitmaskTest (708 ms total)
[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (708 ms total)
[ PASSED ] 1 test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
thread_index_type
to avoid index overflow in grid-stride loopsthread_index_type
to avoid index overflow in grid-stride loops
thread_index_type
to avoid index overflow in grid-stride loops thread_index_type
to avoid index overflow in grid-stride loops
/merge |
Description
This PR checks all related files under
src/hash
,src/bitmask
andsrc/transform
folders and fixes potential index overflow issues by usingthread_index_type
.Checklist