-
Notifications
You must be signed in to change notification settings - Fork 917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the issue of empty lists having empty offsets #10935
Fix the issue of empty lists having empty offsets #10935
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-22.06 #10935 +/- ##
================================================
+ Coverage 86.30% 86.32% +0.02%
================================================
Files 144 144
Lines 22665 22668 +3
================================================
+ Hits 19560 19569 +9
+ Misses 3105 3099 -6
Continue to review full report at Codecov.
|
rerun tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good!
rerun tests |
Please add a description to this PR since it will appear in the changelog. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after adding PR description.
std::unique_ptr<column> empty_list_offset_col; | ||
if (has_empty_list_offsets) { | ||
empty_list_offset_col = make_fixed_width_column(data_type(type_id::INT32), 1); | ||
cudaMemsetAsync(empty_list_offset_col->mutable_view().head(), 0, sizeof(size_type), stream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this could be done with a simple column_view
wrapper around a pointer to
__constant__ int32_t empty_list_offset_data = 0;
This is probably not worth doing for a single value just to save the Memset call.
Added special handling for list cases where empty lists have empty offsets. Generally the size of offsets column in a list column is list column size + 1. For empty list columns, offsets are allowed to be empty instead of size 1. This caused issues in processing lists in the parquet writer.
This PR makes use of the
list_column_device_view
wrapper to handle this case wherever possible and replaces empty list columns with a temporary list column with offset of size 1 wherelist_column_device_view
could not be used.Fixes #10536