Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicitly compute null count in concatenate APIs #13104

Merged
merged 6 commits into from
Apr 13, 2023

Conversation

vyasr
Copy link
Contributor

@vyasr vyasr commented Apr 10, 2023

Description

The total number of nulls in the output can be computed by summing the nulls in the input columns.

Contributes to #11968

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@vyasr vyasr added 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Apr 10, 2023
@vyasr vyasr added this to the Enable streams milestone Apr 10, 2023
@vyasr vyasr self-assigned this Apr 10, 2023
@vyasr vyasr requested a review from a team as a code owner April 10, 2023 16:19
@vyasr vyasr requested review from harrism and ttnghia April 10, 2023 16:19
@vyasr
Copy link
Contributor Author

vyasr commented Apr 10, 2023

@vuule locally this change passes all tests for concatenation itself but fails some tests in the Parquet reader. Would you be able to help me look into those? I am unsure if the reader is catching some edge cases that the test is missing, or if the Parquet code has some null sanitization bug that is causing the failures. I mention sanitization because the failure specifically occurs for list and struct types.

vyasr and others added 2 commits April 10, 2023 09:44
Co-authored-by: David Wendt <[email protected]>
Co-authored-by: Nghia Truong <[email protected]>
@vuule
Copy link
Contributor

vuule commented Apr 10, 2023

@vuule locally this change passes all tests for concatenation itself but fails some tests in the Parquet reader. Would you be able to help me look into those? I am unsure if the reader is catching some edge cases that the test is missing, or if the Parquet code has some null sanitization bug that is causing the failures. I mention sanitization because the failure specifically occurs for list and struct types.

@nvdbaranec

@vuule
Copy link
Contributor

vuule commented Apr 11, 2023

Tests fail because chunked Parquet reader returns columns with incorrect null count, and the new concatenate relies on this info.
Opened #13111 with a potential fix.

@vyasr
Copy link
Contributor Author

vyasr commented Apr 11, 2023

Depends on #13111

@vyasr vyasr requested a review from davidwendt April 13, 2023 19:08
@vyasr vyasr added the 5 - Ready to Merge Testing and reviews complete, ready to merge label Apr 13, 2023
@vyasr
Copy link
Contributor Author

vyasr commented Apr 13, 2023

/merge

@rapids-bot rapids-bot bot merged commit 4f0c46e into rapidsai:branch-23.06 Apr 13, 2023
@vyasr vyasr deleted the feat/concat_null_count branch April 13, 2023 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants