Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix arrays_zip to not rely on broken segmented gather #7484

Merged
merged 3 commits into from
Jan 10, 2023

Conversation

revans2
Copy link
Collaborator

@revans2 revans2 commented Jan 9, 2023

This fixes #7469

Sorry that the patch is so large. I had to fundamentally change how some parts of this work and then did some refactoring to clean it up.

Before the code relied on segmented gather returning a child value for every gather entry even if the input column was null. then they would hack a null back on the output, possibly creating an invalid output where a row was marked as NULL but had a non-null offset.

This changed it so that we clean up the inputs so that the validity is the same everywhere. This should cause the children output from the segmented gather to all have the exact same offsets, so we can pull them apart properly. This is inefficient, because we have to make a copy of all the input if any of them have a NULL. But I didn't want to risk passing an invalid column to segmented gather and having it produce an incorrect result.

I plan on filing another issue for us to stop using mergeAndSetValidity on any nested type. This includes strings.

@revans2 revans2 added the bug Something isn't working label Jan 9, 2023
@revans2 revans2 self-assigned this Jan 9, 2023
@revans2
Copy link
Collaborator Author

revans2 commented Jan 9, 2023

build

@revans2
Copy link
Collaborator Author

revans2 commented Jan 9, 2023

I filed #7485 as follow on work for this.

@ttnghia
Copy link
Collaborator

ttnghia commented Jan 10, 2023

I started to read the code in this PR and wonder if this is a bug or something we need to change in segmented_gather instead.
Will investigate more tomorrow.

@revans2
Copy link
Collaborator Author

revans2 commented Jan 10, 2023

@ttnghia you have it backwards. The old version would not return a zero length segment for NULL. Your patch to make_list rapidsai/cudf#12370 fixed it and now it does the right thing and produce a zero length segment. I don't know if it is guaranteed to always do that or if rapidsai/cudf#12370 is fixing it up afterwards. Either way this patch is dealing with that new reality.

@revans2
Copy link
Collaborator Author

revans2 commented Jan 10, 2023

build

@revans2 revans2 merged commit e94b0c0 into NVIDIA:branch-23.02 Jan 10, 2023
@revans2 revans2 deleted the arrays_zip_fix branch January 10, 2023 20:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] test_arrays_zip failures on nightly
3 participants