-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] mismatched columns on struct lower bound #8187
Comments
Hi there! Please verify that my fix (#8188) actually fixes this. ============= |
…t and values structs columns during flattening (#8188) By default, `structs::detail::flatten_nested_columns` only generate the validity column if the input structs column has a null_mask. For comparing structs from different columns, that validity column should be generated for both sides (both struct columns) at the same time. This fixes #8187. Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - Jason Lowe (https://github.com/jlowe) - Mike Wilson (https://github.com/hyperbolic2346) URL: #8188
We had some tests that were failing on this exception that are now fixed with @ttnghia's fix. Closing this and thanks! |
Heads up, we are seeing this again, now with inputs that are not null at all. I am trying to write the test in c++, but wanted to reopen this. |
I got a repro case for the new failure. diff --git a/cpp/tests/search/search_struct_test.cpp b/cpp/tests/search/search_struct_test.cpp
index 1c2e9b02f0..482853ae21 100644
--- a/cpp/tests/search/search_struct_test.cpp
+++ b/cpp/tests/search/search_struct_test.cpp
@@ -29,7 +29,7 @@ using int32s_col = cudf::test::fixed_width_column_wrapper<int32_t>;
using structs_col = cudf::test::structs_column_wrapper;
using strings_col = cudf::test::strings_column_wrapper;
-constexpr bool print_all{false}; // For debugging only
+constexpr bool print_all{true}; // For debugging only
constexpr int32_t null{0}; // Mark for null child elements
constexpr int32_t XXX{0}; // Mark for null struct elements
@@ -144,6 +144,31 @@ TYPED_TEST(TypedStructSearchTest, SlicedColumnInputTests)
CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected_upper_bound, results.second->view(), print_all);
}
+TYPED_TEST(TypedStructSearchTest, UpperBoundColMismatchTest)
+{
+ using col_wrapper = cudf::test::fixed_width_column_wrapper<int64_t>;
+ auto too_big = cudf::test::iterator_with_null_at(2048);
+
+ auto child_col_t = col_wrapper{{-7858725485978029677L}, too_big};
+ auto const structs_t = structs_col{{child_col_t}, too_big}.release();
+
+ auto child_col_values = col_wrapper{{-8953497368527767583L}};
+ auto const structs_values = structs_col{{child_col_values}}.release();
+
+ cudf::test::print(*structs_t);
+ cudf::test::print(*structs_values);
+
+ std::cerr << "A" << std::endl;
+ auto results = search_bounds(structs_t, structs_values);
+ auto expected_lower_bound = int32s_col{1};
+ auto expected_upper_bound = int32s_col{1};
+ std::cerr << "B" << std::endl;
+ CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected_lower_bound, results.first->view(), print_all);
+ std::cerr << "C" << std::endl;
+ CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected_upper_bound, results.second->view(), print_all);
+ std::cerr << "D" << std::endl;
+}
+
TYPED_TEST(TypedStructSearchTest, SimpleInputWithNullsTests)
{
using col_wrapper = cudf::test::fixed_width_column_wrapper<TypeParam, int32_t>; The test is not great but it does reproduce the error. It looks to be related to validity only being on one column, and not the other. |
This is a better test so at least it covers the types. diff --git a/cpp/tests/search/search_struct_test.cpp b/cpp/tests/search/search_struct_test.cpp
index 1c2e9b02f0..809d055beb 100644
--- a/cpp/tests/search/search_struct_test.cpp
+++ b/cpp/tests/search/search_struct_test.cpp
@@ -29,7 +29,7 @@ using int32s_col = cudf::test::fixed_width_column_wrapper<int32_t>;
using structs_col = cudf::test::structs_column_wrapper;
using strings_col = cudf::test::strings_column_wrapper;
-constexpr bool print_all{false}; // For debugging only
+constexpr bool print_all{true}; // For debugging only
constexpr int32_t null{0}; // Mark for null child elements
constexpr int32_t XXX{0}; // Mark for null struct elements
@@ -144,6 +144,31 @@ TYPED_TEST(TypedStructSearchTest, SlicedColumnInputTests)
CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected_upper_bound, results.second->view(), print_all);
}
+TYPED_TEST(TypedStructSearchTest, UpperBoundColMismatchTest)
+{
+ using col_wrapper = cudf::test::fixed_width_column_wrapper<TypeParam, int32_t>;
+ auto too_big = cudf::test::iterator_with_null_at(2048);
+
+ auto child_col_t = col_wrapper{{100}, too_big};
+ auto const structs_t = structs_col{{child_col_t}, too_big}.release();
+
+ auto child_col_values = col_wrapper{{0}};
+ auto const structs_values = structs_col{{child_col_values}}.release();
+
+ cudf::test::print(*structs_t);
+ cudf::test::print(*structs_values);
+
+ std::cerr << "A" << std::endl;
+ auto results = search_bounds(structs_t, structs_values);
+ auto expected_lower_bound = int32s_col{1};
+ auto expected_upper_bound = int32s_col{1};
+ std::cerr << "B" << std::endl;
+ CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected_lower_bound, results.first->view(), print_all);
+ std::cerr << "C" << std::endl;
+ CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected_upper_bound, results.second->view(), print_all);
+ std::cerr << "D" << std::endl;
+}
+
TYPED_TEST(TypedStructSearchTest, SimpleInputWithNullsTests)
{
using col_wrapper = cudf::test::fixed_width_column_wrapper<TypeParam, int32_t>;
|
…lumn has null element (#8374) Currently, struct flattening adds a validity column when the input column has a null mask (by calling to `nullable()`). In the situation when comparing two structs columns having no null but one column has a null mask, flattening them will result in two tables with different numbers of columns. This PR fix that problem by using `has_nulls()` instead of `nullable()`. As a result, the validity column will be added to the flattening result only when the input structs column has null. Note that when comparing two structs columns in which one column has null while the other doesn't, we must check for (nested) null existence and pass in `column_nullability::FORCE` for flattening both columns. This makes sure the flattening results are tables having the same number of columns. Closes #8187. Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - David Wendt (https://github.com/davidwendt) - MithunR (https://github.com/mythrocks) URL: #8374
This PR adds a simple test case for struct binary search. In this test, we do binary search for 2 structs columns in which one column has a bit mask (but no null element) while the other column does not have any bit mask. Reference: #8374 | #8187 Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - Conor Hoekstra (https://github.com/codereport) URL: #8396
Describe the bug
In
lower_bound
andupper_bound
for structs it callsstructs::detail::flatten_nested_columns
for botht
andvalues
but it does this independently for both of them. That means if we end up with at
that has null structs but avalues
that does not, or vise versa we can get errors.Steps/Code to reproduce bug
This causes the error to happen. I have not updated the expected result to what they should be, but it does expose the error.
The text was updated successfully, but these errors were encountered: