Skip to content

Commit

Permalink
Fix struct flattening to add a validity column only when the input co…
Browse files Browse the repository at this point in the history
…lumn has null element (#8374)

Currently, struct flattening adds a validity column when the input column has a null mask (by calling to `nullable()`). In the situation when comparing two structs columns having no null but one column has a null mask, flattening them will result in two tables with different numbers of columns.

This PR fix that problem by using `has_nulls()` instead of `nullable()`. As a result, the validity column will be added to the flattening result only when the input structs column has null.

Note that when comparing two structs columns in which one column has null while the other doesn't, we must check for (nested) null existence and pass in `column_nullability::FORCE` for flattening both columns. This makes sure the flattening results are tables having the same number of columns.

Closes #8187.

Authors:
  - Nghia Truong (https://github.com/ttnghia)

Approvers:
  - David Wendt (https://github.com/davidwendt)
  - MithunR (https://github.com/mythrocks)

URL: #8374
  • Loading branch information
ttnghia authored May 27, 2021
1 parent 50c0335 commit 4d1a62e
Showing 1 changed file with 15 additions and 4 deletions.
19 changes: 15 additions & 4 deletions cpp/src/structs/utilities.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -93,11 +93,22 @@ struct flattened_table {
order col_order,
null_order col_null_order)
{
if (nullability == column_nullability::FORCE || col.nullable()) {
// nullable columns could be required for comparisions such as joins
// Even if it is not required to extract the bitmask to a separate column,
// we should always do that if the structs column has any null element.
//
// In addition, we should check for null by calling to `has_nulls()`, not `nullable()`.
// This is because when comparing structs columns, if one column has bitmask while the other
// does not (and both columns do not have any null element) then flattening them using
// `nullable()` will result in tables with different number of columns.
//
// Notice that, for comparing structs columns when one column has null while the other
// doesn't, `nullability` must be passed in with value `column_nullability::FORCE` to make
// sure the flattening results are tables having the same number of columns.

if (nullability == column_nullability::FORCE || col.has_nulls()) {
validity_as_column.push_back(cudf::is_valid(col));
if (col.nullable()) {
// copy bitmask only works if the column is nullable
if (col.has_nulls()) {
// copy bitmask is needed only if the column has null
validity_as_column.back()->set_null_mask(copy_bitmask(col));
}
flat_columns.push_back(validity_as_column.back()->view());
Expand Down

0 comments on commit 4d1a62e

Please sign in to comment.