Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix struct flattening to add a validity column only when the input column has null element #8374

Merged
merged 4 commits into from
May 27, 2021

Conversation

ttnghia
Copy link
Contributor

@ttnghia ttnghia commented May 26, 2021

Currently, struct flattening adds a validity column when the input column has a null mask (by calling to nullable()). In the situation when comparing two structs columns having no null but one column has a null mask, flattening them will result in two tables with different numbers of columns.

This PR fix that problem by using has_nulls() instead of nullable(). As a result, the validity column will be added to the flattening result only when the input structs column has null.

Note that when comparing two structs columns in which one column has null while the other doesn't, we must check for (nested) null existence and pass in column_nullability::FORCE for flattening both columns. This makes sure the flattening results are tables having the same number of columns.

Closes #8187.

@ttnghia ttnghia added bug Something isn't working 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. ! - Hotfix Hotfix is a bug that affects the majority of users for which there is no reasonable workaround Spark Functionality that helps Spark RAPIDS non-breaking Non-breaking change labels May 26, 2021
@ttnghia ttnghia self-assigned this May 26, 2021
@ttnghia ttnghia requested a review from a team as a code owner May 26, 2021 20:10
@ttnghia ttnghia requested review from trxcllnt and davidwendt May 26, 2021 20:10
@abellina
Copy link
Contributor

Locally this fixes the issue reported by @gerashegalov

Copy link
Contributor

@mythrocks mythrocks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@harrism harrism added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels May 26, 2021
@harrism
Copy link
Member

harrism commented May 26, 2021

@gpucibot merge

@gerashegalov
Copy link
Contributor

Can you add a test case (ok if in another PR) so we don't defer to spark-rapids to catch these issues

@ttnghia
Copy link
Contributor Author

ttnghia commented May 26, 2021

Can you add a test case (ok if in another PR) so we don't defer to spark-rapids to catch these issues

Sure. But since we are about to do code freeze, can that test be targeted to branch 21.08?

@gerashegalov
Copy link
Contributor

Sure. But since we are about to do code freeze, can that test be targeted to branch 21.08?

I think it's reasonable since @abellina confirmed the fix with a Spark test run

@codecov
Copy link

codecov bot commented May 27, 2021

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.06@e97fc1c). Click here to learn what that means.
The diff coverage is n/a.

❗ Current head 447b8fe differs from pull request most recent head 224eb13. Consider uploading reports for the commit 224eb13 to get more accurate results
Impacted file tree graph

@@               Coverage Diff               @@
##             branch-21.06    #8374   +/-   ##
===============================================
  Coverage                ?   82.86%           
===============================================
  Files                   ?      106           
  Lines                   ?    17874           
  Branches                ?        0           
===============================================
  Hits                    ?    14811           
  Misses                  ?     3063           
  Partials                ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e97fc1c...224eb13. Read the comment docs.

@rapids-bot rapids-bot bot merged commit 4d1a62e into rapidsai:branch-21.06 May 27, 2021
@ttnghia ttnghia deleted the fix_struct_flattening branch May 27, 2021 13:32
rapids-bot bot pushed a commit that referenced this pull request Jun 3, 2021
This PR adds a simple test case for struct binary search. In this test, we do binary search for 2 structs columns in which one column has a bit mask (but no null element) while the other column does not have any bit mask.

Reference: #8374 | #8187

Authors:
  - Nghia Truong (https://github.com/ttnghia)

Approvers:
  - Conor Hoekstra (https://github.com/codereport)

URL: #8396
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge ! - Hotfix Hotfix is a bug that affects the majority of users for which there is no reasonable workaround bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] mismatched columns on struct lower bound
7 participants