-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] make_lists_column does not enforce arrow-compatible invariants on offsets column #16164
Comments
I believe we do option 1 for strings columns so I think we should do the same for lists columns cudf/cpp/src/interop/to_arrow.cu Lines 293 to 301 in a4be7bd
Accessing child data in an empty column is considered UB in libcudf. I should add this to the developer guide. |
How would you represent a nested empty list column, such that the correct datatype can be determined, if accessing child data is UB? |
That is a good point. I was hoping for a global rule. The empty child column is necessary to access the list column's type. cudf/cpp/src/lists/lists_column_factories.cu Lines 86 to 94 in 64325a1
So another thing worth documenting in the dev guide. Are you ok with option 1 as you described? I feel that would be more consistent with how offsets are handled for empty strings columns. |
FWIW, I really wish the nested type implementation in libcudf stored the structure in the
Yes, I think so. I guess it means one always has to remember to special-case empty list columns in libcudf algorithms, but I suppose that is done at the moment anyway to avoid kernel launches for empty data. |
The offset column of a nested empty list column may be empty as discussed in #16164. `ListColumn.memory_usage` assumed that this column was non-empty Unblocks rapidsai/cuspatial#1400 Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Lawrence Mitchell (https://github.com/wence-) URL: #16193
Describe the bug
The arrow spec says, of offsets:
(emphasis mine).
However,
make_lists_column
enforces:cudf/cpp/src/lists/lists_column_factories.cu
Lines 127 to 129 in a4be7bd
That is, it allows the offsets buffer to have length zero iff the number of rows in the list column is zero.
make_empty_lists_column
uses this, laxer, definition and will produce a length-zero list column with an offset array of length zero.Consequently, a library which checks the invariants on list columns when ingesting an empty list column that is exported by libcudf will observe that they are not satisfied. Libraries that do not check these invariants may read arbitrary memory and/or deduce incorrect sizes if they ever access the offset array.
Steps/Code to reproduce bug
Expected behavior
Empty list columns should have a layout that is compatible with arrow.
I suppose there are two ways to fix this:
to_arrow
special case construction of empty list columns (we already do this for empty strings).They're probably not mutually exclusive.
The text was updated successfully, but these errors were encountered: