Skip to content

Commit

Permalink
Add overflow check when converting large strings to lists columns (#1…
Browse files Browse the repository at this point in the history
…5887)

Fixes a couple places where strings columns are converted to lists column as binary -- chars are represented as INT8.
Since lists columns only support `size_type` offsets type, this change will throw an error if the size of the chars exceeds max `size_type` values.

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Mike Wilson (https://github.com/hyperbolic2346)
  - MithunR (https://github.com/mythrocks)

URL: #15887
  • Loading branch information
davidwendt authored Jun 4, 2024
1 parent eb46016 commit fc31aa3
Show file tree
Hide file tree
Showing 2 changed files with 12 additions and 3 deletions.
4 changes: 4 additions & 0 deletions cpp/src/io/utilities/column_buffer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -191,6 +191,10 @@ std::unique_ptr<column> make_column(column_buffer_base<string_policy>& buffer,
auto data = col_content.data.release();
auto char_size = data->size();

CUDF_EXPECTS(char_size < static_cast<std::size_t>(std::numeric_limits<size_type>::max()),
"Cannot convert strings column to lists column due to size_type limit",
std::overflow_error);

auto uint8_col = std::make_unique<column>(
data_type{type_id::UINT8}, char_size, std::move(*data), rmm::device_buffer{}, 0);

Expand Down
11 changes: 8 additions & 3 deletions cpp/src/reshape/byte_cast.cu
Original file line number Diff line number Diff line change
Expand Up @@ -135,9 +135,14 @@ struct byte_list_conversion_fn<T, std::enable_if_t<std::is_same_v<T, cudf::strin
input.size(), output_type, stream, mr);
}

auto col_content = std::make_unique<column>(input, stream, mr)->release();
auto const num_chars = col_content.data->size();
auto uint8_col = std::make_unique<column>(
auto const num_chars = strings_column_view(input).chars_size(stream);
CUDF_EXPECTS(num_chars < static_cast<int64_t>(std::numeric_limits<size_type>::max()),
"Cannot convert strings column to lists column due to size_type limit",
std::overflow_error);

auto col_content = std::make_unique<column>(input, stream, mr)->release();

auto uint8_col = std::make_unique<column>(
output_type, num_chars, std::move(*(col_content.data)), rmm::device_buffer{}, 0);

auto result = make_lists_column(
Expand Down

0 comments on commit fc31aa3

Please sign in to comment.