Skip to content

Commit

Permalink
Check num_children() == 0 in Column.from_column_view (#17193)
Browse files Browse the repository at this point in the history
This fixes a bug where `Column.from_column_view` is not verifying the existence of a string column's offsets child column prior to accessing it, resulting in a segmentation fault when passing a `column_view` from `Column.view()` to `Column.from_column_view(...)`.

The issue can be reproduced with:
```
import cudf
from cudf.core.column.column import as_column
df = cudf.DataFrame({'a': cudf.Series([[]], dtype=cudf.core.dtypes.ListDtype('string'))})
s = df['a']
col = as_column(s)
col2 = cudf._lib.column.Column.back_and_forth(col)
print(col)
print(col2)
```
where `back_and_forth` is defined as:
```
    @staticmethod
    def back_and_forth(Column input_column):
        cdef column_view input_column_view = input_column.view()
        return Column.from_column_view(input_column_view, input_column)
```
I don't have the expertise to write the appropriate tests for this without introducing the `back_and_forth` function as an API, which seems undesirable.

Authors:
  - Christopher Harris (https://github.com/cwharris)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #17193
  • Loading branch information
cwharris authored Oct 29, 2024
1 parent 1ad9fc1 commit bf5b778
Showing 1 changed file with 10 additions and 7 deletions.
17 changes: 10 additions & 7 deletions python/cudf/cudf/_lib/column.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -688,15 +688,18 @@ cdef class Column:
# special case for string column
is_string_column = (cv.type().id() == libcudf_types.type_id.STRING)
if is_string_column:
# get the size from offset child column (device to host copy)
offsets_column_index = 0
offset_child_column = cv.child(offsets_column_index)
if offset_child_column.size() == 0:
if cv.num_children() == 0:
base_nbytes = 0
else:
chars_size = get_element(
offset_child_column, offset_child_column.size()-1).value
base_nbytes = chars_size
# get the size from offset child column (device to host copy)
offsets_column_index = 0
offset_child_column = cv.child(offsets_column_index)
if offset_child_column.size() == 0:
base_nbytes = 0
else:
chars_size = get_element(
offset_child_column, offset_child_column.size()-1).value
base_nbytes = chars_size

if data_ptr:
if data_owner is None:
Expand Down

0 comments on commit bf5b778

Please sign in to comment.