Skip to content

Commit

Permalink
Avoid overflow in fused_concatenate_kernel output_index (#10344)
Browse files Browse the repository at this point in the history
Fixes #10333.

The repro case in the issue showed an illegal access error where the `output_index` of the strided loop in `fused_concatenate_kernel` can overflow for a large number of rows. 

For example, given 5 tables of exactly 250M rows each we would expect a result with 1,250,000,000 rows. 

The kernel is launched with 4,882,813 blocks (# of rows / 256 threads rounded up) with a stride of 1,250,000,128 (256 * 4,882,813). When `output_index` reaches 897,483,520, it overflows `output_index` on the first iteration.

The change below prevents the overflow by making `output_index` an `int64_t` and adds a test that shows that we can now concatenate up to `size_type::max - 1` rows.

Authors:
  - Alessandro Bellina (https://github.com/abellina)

Approvers:
  - Nghia Truong (https://github.com/ttnghia)
  - Jake Hemstad (https://github.com/jrhemstad)
  - MithunR (https://github.com/mythrocks)

URL: #10344
  • Loading branch information
abellina authored Feb 28, 2022
1 parent 619b2c7 commit 64ee514
Show file tree
Hide file tree
Showing 2 changed files with 20 additions and 4 deletions.
6 changes: 3 additions & 3 deletions cpp/src/copying/concatenate.cu
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2020-2021, NVIDIA CORPORATION.
* Copyright (c) 2020-2022, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -166,7 +166,7 @@ __global__ void fused_concatenate_kernel(column_device_view const* input_views,
auto const output_size = output_view.size();
auto* output_data = output_view.data<T>();

size_type output_index = threadIdx.x + blockIdx.x * blockDim.x;
int64_t output_index = threadIdx.x + blockIdx.x * blockDim.x;
size_type warp_valid_count = 0;

unsigned active_mask;
Expand Down Expand Up @@ -222,7 +222,7 @@ std::unique_ptr<column> fused_concatenate(host_span<column_view const> views,
auto const& d_offsets = std::get<2>(device_views);
auto const output_size = std::get<3>(device_views);

CUDF_EXPECTS(output_size < static_cast<std::size_t>(std::numeric_limits<size_type>::max()),
CUDF_EXPECTS(output_size <= static_cast<std::size_t>(std::numeric_limits<size_type>::max()),
"Total number of concatenated rows exceeds size_type range");

// Allocate output
Expand Down
18 changes: 17 additions & 1 deletion cpp/tests/copying/concatenate_tests.cu
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2020-2021, NVIDIA CORPORATION.
* Copyright (c) 2020-2022, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -340,6 +340,22 @@ struct OverflowTest : public cudf::test::BaseFixture {
TEST_F(OverflowTest, OverflowTest)
{
using namespace cudf;
// should concatenate up to size_type::max rows.
{
// 5 x size + size_last adds to size_type::max
constexpr auto size = static_cast<size_type>(static_cast<uint32_t>(250) * 1024 * 1024);
constexpr auto size_last = static_cast<size_type>(836763647);

auto many_chars = cudf::make_fixed_width_column(data_type{type_id::INT8}, size);
auto many_chars_last = cudf::make_fixed_width_column(data_type{type_id::INT8}, size_last);

table_view tbl({*many_chars});
table_view tbl_last({*many_chars_last});
std::vector<cudf::table_view> table_views_to_concat({tbl, tbl, tbl, tbl, tbl, tbl_last});
std::unique_ptr<cudf::table> concatenated_tables = cudf::concatenate(table_views_to_concat);
EXPECT_NO_THROW(rmm::cuda_stream_default.synchronize());
ASSERT_EQ(concatenated_tables->num_rows(), std::numeric_limits<size_type>::max());
}

// primitive column
{
Expand Down

0 comments on commit 64ee514

Please sign in to comment.