Skip to content

Commit

Permalink
Fix initialization error in to_arrow for empty string views (#16033)
Browse files Browse the repository at this point in the history
When converting an empty string view to arrow, we don't bother with copies from device, but rather create the arrow arrays directly. The offset buffer is therefore a singleton int32 array with zero in it.

Previously, the initialization of this array was incorrect, since mutable_data() returns a uint8_t pointer, and so setting the single element could leave 24 of the 32 bits uninitialized.

Fix this by using memset instead to zero out the full buffer.

Authors:
  - Lawrence Mitchell (https://github.com/wence-)

Approvers:
  - David Wendt (https://github.com/davidwendt)
  - Bradley Dice (https://github.com/bdice)

URL: #16033
  • Loading branch information
wence- authored Jun 14, 2024
1 parent 374ee13 commit 2297f9a
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions cpp/src/interop/to_arrow.cu
Original file line number Diff line number Diff line change
Expand Up @@ -292,9 +292,9 @@ std::shared_ptr<arrow::Array> dispatch_to_arrow::operator()<cudf::string_view>(
auto child_arrays = fetch_child_array(input_view, {{}, {}}, ar_mr, stream);
if (child_arrays.empty()) {
// Empty string will have only one value in offset of 4 bytes
auto tmp_offset_buffer = allocate_arrow_buffer(4, ar_mr);
auto tmp_data_buffer = allocate_arrow_buffer(0, ar_mr);
tmp_offset_buffer->mutable_data()[0] = 0;
auto tmp_offset_buffer = allocate_arrow_buffer(sizeof(int32_t), ar_mr);
auto tmp_data_buffer = allocate_arrow_buffer(0, ar_mr);
memset(tmp_offset_buffer->mutable_data(), 0, sizeof(int32_t));

return std::make_shared<arrow::StringArray>(
0, std::move(tmp_offset_buffer), std::move(tmp_data_buffer));
Expand Down

0 comments on commit 2297f9a

Please sign in to comment.