Use `thread_index_type` to avoid index overflow in grid-stride loops #13895

PointKernel · 2023-08-16T23:24:22Z

Description

This PR checks all related files under src/hash, src/bitmask and src/transform folders and fixes potential index overflow issues by using thread_index_type.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

PointKernel · 2023-08-16T23:29:21Z

Question for reviewers: I don't think we have tests exercising an input that could cause overflow. I hesitated to add those tests since it requires relatively large memory and may take a long time to execute. What do you think?

ttnghia · 2023-08-17T18:01:18Z

cpp/src/bitmask/null_mask.cu

+  thread_index_type const tid         = threadIdx.x + blockIdx.x * blockDim.x;
+  thread_index_type const stride      = blockDim.x * gridDim.x;
+  thread_index_type thread_word_index = tid + first_word_index;


Probably we also need to upgrade first_ and last_ indices above.

We can reply on implicit conversions for < or <= since they are always cast to the larger data type for comparison. first_word_index and last_word_index are explicitly used as size_type later in the loop thus setting them to thread_index_type seems not worth the effort.

I think I understand, but I'd better confirm:

first_word_index and last_word_index are int32_t.

tid is uint32_t. So thread_word_index is eventually uint32_t.

while (thread_word_idx <= last_word_index) compares the two as uint32_t.

The place where thread_word_index might overflow is at line 275 (thread_word_index += stride). But both thread_word_index and stride are uint32_t already.

Yeah, it looks like this should work.

cpp/src/bitmask/null_mask.cu

karthikeyann · 2023-08-17T18:36:07Z

Question for reviewers: I don't think we have tests exercising an input that could cause overflow. I hesitated to add those tests since it requires relatively large memory and may take a long time to execute. What do you think?

The for-loop comparison happens against shorter-width type size_type. So, is overflow possible to test?

PointKernel · 2023-08-17T18:57:29Z

The for-loop comparison happens against shorter-width type size_type. So, is overflow possible to test?

It should be possible if the input size is smaller than INT_MAX and larger than INT_MAX - stride, then the "last" iteration with int32_t index will produce negative indices when doing tid += stride; which is smaller than the end condition thus causing illegal memory access in the iteration right after.

karthikeyann · 2023-08-18T02:06:34Z

Benchmarks could be a good place for very large sizes (not for testing, but to catch illegal accesses). But Only unit tests are run through memcheck regularly. So, it won't be caught regularly unless it's added in unit tests. Besides, memcheck will be very slow to run on very large inputs.

Almost all of our algorithms aren't tested for INT32 max sizes. So, it may be okay to not add unit tests for this.
@PointKernel Were you able to verify with a unit test case locally? (How much time does it take to run?)

PointKernel · 2023-08-18T17:10:40Z

@PointKernel Were you able to verify with a unit test case locally? (How much time does it take to run?)

I wrote a test as below and realized bitmask is a special case: the loop end condition is no smaller than the number of bitmask words which is total_bits / num_bits_per_word. Since total_bits is size_type, the thread index would almost never overflow (unless we change the "word" type to something 1 bit only).

TEST_F(SetBitmaskTest, index_overflow)
{
  auto const begin = 0;
  auto const end   = INT_MAX - 10;
  auto const valid = true;
  auto const size  = end - begin;

  thrust::host_vector<bool> expected(size, valid);
  rmm::device_buffer mask = create_null_mask(size, cudf::mask_state::UNINITIALIZED);

  auto bitmask = static_cast<cudf::bitmask_type*>(mask.data());
  cudf::set_null_mask(bitmask, begin, end, valid);

  auto stream = cudf::get_default_stream();

  rmm::device_uvector<bool> output(size, stream);
  auto counting_iter = thrust::counting_iterator<cudf::size_type>{0};
  thrust::transform(rmm::exec_policy(stream),
                    counting_iter,
                    counting_iter + size,
                    output.begin(),
                    valid_bit_functor{bitmask});

  auto const result = thrust::all_of(
    rmm::exec_policy(stream), output.begin(), output.end(), thrust::identity<bool>{});

  EXPECT_EQ(result, valid);
}

Not sure if it's still relevant, to answer your question about runtime, the test took about 700 ms to run.

(base) yunsongw@yunsongw-dt:~/dev/rapids/cudf/cpp/build/release/gtests$ ./BITMASK_TEST --gtest_filter=SetBitmaskTest.index_overflow
Note: Google Test filter = SetBitmaskTest.index_overflow
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from SetBitmaskTest
[ RUN      ] SetBitmaskTest.index_overflow
[       OK ] SetBitmaskTest.index_overflow (708 ms)
[----------] 1 test from SetBitmaskTest (708 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (708 ms total)
[  PASSED  ] 1 test.

mythrocks

LGTM!

…verflow

… fix-index-overflow

PointKernel · 2023-08-24T22:28:06Z

/merge

PointKernel added 2 commits August 16, 2023 16:18

Use thread_index_type to avoid index overflow in grid-stride loops

dcca6d1

Fix a typo

2f0e42f

PointKernel added bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change labels Aug 16, 2023

PointKernel self-assigned this Aug 16, 2023

PointKernel requested a review from a team as a code owner August 16, 2023 23:24

PointKernel requested review from mythrocks and karthikeyann August 16, 2023 23:24

PointKernel added the 3 - Ready for Review Ready for review by team label Aug 16, 2023

ttnghia reviewed Aug 17, 2023

View reviewed changes

cpp/src/bitmask/null_mask.cu Show resolved Hide resolved

Minor cleanups

0f7ae5f

ttnghia approved these changes Aug 18, 2023

View reviewed changes

GregoryKimball mentioned this pull request Aug 21, 2023

Prevent grid stride loop overflow in libcudf kernels #10368

Open

mythrocks approved these changes Aug 22, 2023

View reviewed changes

PointKernel added 3 commits August 23, 2023 18:17

Minor cleanups: use global_thread_id

7367b85

Merge remote-tracking branch 'upstream/branch-23.10' into fix-index-o…

e6be0ed

…verflow

Minor cleanups: use global_thread_id

dc9d5e5

ttnghia approved these changes Aug 24, 2023

View reviewed changes

karthikeyann changed the title ~~Use thread_index_type to avoid index overflow in grid-stride loops~~ Use thread_index_type to avoid index overflow in grid-stride loops Aug 24, 2023

karthikeyann approved these changes Aug 24, 2023

View reviewed changes

PointKernel added 2 commits August 24, 2023 09:20

Merge branch 'branch-23.10' into fix-index-overflow

11043bb

Revert the use of global_thread_id in JIT kernel to avoid build issue

aa9a855

PointKernel added 2 commits August 24, 2023 13:37

Merge remote-tracking branch 'upstream/branch-23.10' into fix-index-o…

16339de

…verflow

Merge branch 'fix-index-overflow' of github.com:PointKernel/cudf into…

999bbb3

… fix-index-overflow

rapids-bot bot merged commit ff99f98 into rapidsai:branch-23.10 Aug 24, 2023

PointKernel deleted the fix-index-overflow branch May 23, 2024 22:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `thread_index_type` to avoid index overflow in grid-stride loops #13895

Use `thread_index_type` to avoid index overflow in grid-stride loops #13895

PointKernel commented Aug 16, 2023

PointKernel commented Aug 16, 2023

ttnghia Aug 17, 2023

PointKernel Aug 17, 2023

mythrocks Aug 22, 2023 •

edited

Loading

karthikeyann commented Aug 17, 2023 •

edited

Loading

PointKernel commented Aug 17, 2023 •

edited

Loading

karthikeyann commented Aug 18, 2023

PointKernel commented Aug 18, 2023

mythrocks left a comment

PointKernel commented Aug 24, 2023

Use thread_index_type to avoid index overflow in grid-stride loops #13895

Use thread_index_type to avoid index overflow in grid-stride loops #13895

Conversation

PointKernel commented Aug 16, 2023

Description

Checklist

PointKernel commented Aug 16, 2023

ttnghia Aug 17, 2023

Choose a reason for hiding this comment

PointKernel Aug 17, 2023

Choose a reason for hiding this comment

mythrocks Aug 22, 2023 • edited Loading

Choose a reason for hiding this comment

karthikeyann commented Aug 17, 2023 • edited Loading

PointKernel commented Aug 17, 2023 • edited Loading

karthikeyann commented Aug 18, 2023

PointKernel commented Aug 18, 2023

mythrocks left a comment

Choose a reason for hiding this comment

PointKernel commented Aug 24, 2023

Use `thread_index_type` to avoid index overflow in grid-stride loops #13895

Use `thread_index_type` to avoid index overflow in grid-stride loops #13895

mythrocks Aug 22, 2023 •

edited

Loading

karthikeyann commented Aug 17, 2023 •

edited

Loading

PointKernel commented Aug 17, 2023 •

edited

Loading