Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] mismatched columns on struct lower bound #8187

Closed
revans2 opened this issue May 7, 2021 · 5 comments · Fixed by #8188 or #8374
Closed

[BUG] mismatched columns on struct lower bound #8187

revans2 opened this issue May 7, 2021 · 5 comments · Fixed by #8188 or #8374
Assignees
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS

Comments

@revans2
Copy link
Contributor

revans2 commented May 7, 2021

Describe the bug
In lower_bound and upper_bound for structs it calls structs::detail::flatten_nested_columns for both t and values but it does this independently for both of them. That means if we end up with a t that has null structs but a values that does not, or vise versa we can get errors.

Steps/Code to reproduce bug

diff --git a/cpp/tests/search/search_struct_test.cpp b/cpp/tests/search/search_struct_test.cpp
index 50c326269a..c127f6c8b6 100644
--- a/cpp/tests/search/search_struct_test.cpp
+++ b/cpp/tests/search/search_struct_test.cpp
@@ -152,8 +152,8 @@ TYPED_TEST(TypedStructSearchTest, SimpleInputWithNullsTests)
   auto const structs_values = structs_col{{child_col_values}, null_at(3)}.release();
 
   // Sorted asc, nulls first
-  auto child_col_t = col_wrapper{{XXX, null, 0, 1, 2, 2, 2, 2, 3, 3, 4}, null_at(1)};
-  auto structs_t   = structs_col{{child_col_t}, null_at(0)}.release();
+  auto child_col_t = col_wrapper{{0, 0, 0, 1, 2, 2, 2, 2, 3, 3, 4}};
+  auto structs_t   = structs_col{{child_col_t}}.release();
 
   auto results =
     search_bounds(structs_t, structs_values, {cudf::order::ASCENDING}, {cudf::null_order::BEFORE});

This causes the error to happen. I have not updated the expected result to what they should be, but it does expose the error.

[ RUN      ] TypedStructSearchTest/17.SimpleInputWithNullsTests
unknown file: Failure
C++ exception with description "cuDF failure at: ../include/cudf/table/row_operators.cuh:361: Mismatched number of columns." thrown in the test body.
[  FAILED  ] TypedStructSearchTest/17.SimpleInputWithNullsTests, where TypeParam = cudf::detail::timestamp<cuda::std::__3::chrono::duration<long, cuda::std::__3::ratio<1l, 1000l> > > (0 ms)
@revans2 revans2 added bug Something isn't working Needs Triage Need team to review and classify labels May 7, 2021
@jlowe jlowe added libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS and removed Needs Triage Need team to review and classify labels May 7, 2021
@ttnghia ttnghia self-assigned this May 7, 2021
@ttnghia
Copy link
Contributor

ttnghia commented May 7, 2021

Hi there! Please verify that my fix (#8188) actually fixes this.

=============
PS: Ooop, I accidentally merged the PR. If the bug is still there, I'll try again with a new PR.

rapids-bot bot pushed a commit that referenced this issue May 10, 2021
…t and values structs columns during flattening (#8188)

By default, `structs::detail::flatten_nested_columns` only generate the validity column if the input structs column has a null_mask. For comparing structs from different columns, that validity column should be generated for both sides (both struct columns) at the same time.

This fixes #8187.

Authors:
  - Nghia Truong (https://github.com/ttnghia)

Approvers:
  - Jason Lowe (https://github.com/jlowe)
  - Mike Wilson (https://github.com/hyperbolic2346)

URL: #8188
@ttnghia ttnghia reopened this May 10, 2021
@abellina
Copy link
Contributor

abellina commented May 11, 2021

We had some tests that were failing on this exception that are now fixed with @ttnghia's fix. Closing this and thanks!

@abellina abellina reopened this May 26, 2021
@abellina
Copy link
Contributor

Heads up, we are seeing this again, now with inputs that are not null at all. I am trying to write the test in c++, but wanted to reopen this.

@revans2
Copy link
Contributor Author

revans2 commented May 26, 2021

I got a repro case for the new failure.

diff --git a/cpp/tests/search/search_struct_test.cpp b/cpp/tests/search/search_struct_test.cpp
index 1c2e9b02f0..482853ae21 100644
--- a/cpp/tests/search/search_struct_test.cpp
+++ b/cpp/tests/search/search_struct_test.cpp
@@ -29,7 +29,7 @@ using int32s_col  = cudf::test::fixed_width_column_wrapper<int32_t>;
 using structs_col = cudf::test::structs_column_wrapper;
 using strings_col = cudf::test::strings_column_wrapper;
 
-constexpr bool print_all{false};  // For debugging only
+constexpr bool print_all{true};  // For debugging only
 constexpr int32_t null{0};        // Mark for null child elements
 constexpr int32_t XXX{0};         // Mark for null struct elements
 
@@ -144,6 +144,31 @@ TYPED_TEST(TypedStructSearchTest, SlicedColumnInputTests)
   CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected_upper_bound, results.second->view(), print_all);
 }
 
+TYPED_TEST(TypedStructSearchTest, UpperBoundColMismatchTest)
+{
+  using col_wrapper = cudf::test::fixed_width_column_wrapper<int64_t>;
+  auto too_big = cudf::test::iterator_with_null_at(2048);
+
+  auto child_col_t = col_wrapper{{-7858725485978029677L}, too_big};
+  auto const structs_t = structs_col{{child_col_t}, too_big}.release();
+
+  auto child_col_values              = col_wrapper{{-8953497368527767583L}};
+  auto const structs_values = structs_col{{child_col_values}}.release();
+
+  cudf::test::print(*structs_t);
+  cudf::test::print(*structs_values);
+
+  std::cerr << "A" << std::endl;
+  auto results   = search_bounds(structs_t, structs_values);
+  auto expected_lower_bound = int32s_col{1};
+  auto expected_upper_bound = int32s_col{1};
+  std::cerr << "B" << std::endl;
+  CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected_lower_bound, results.first->view(), print_all);
+  std::cerr << "C" << std::endl;
+  CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected_upper_bound, results.second->view(), print_all);
+  std::cerr << "D" << std::endl;
+}
+
 TYPED_TEST(TypedStructSearchTest, SimpleInputWithNullsTests)
 {
   using col_wrapper = cudf::test::fixed_width_column_wrapper<TypeParam, int32_t>;

The test is not great but it does reproduce the error. It looks to be related to validity only being on one column, and not the other.

@revans2
Copy link
Contributor Author

revans2 commented May 26, 2021

This is a better test so at least it covers the types.

diff --git a/cpp/tests/search/search_struct_test.cpp b/cpp/tests/search/search_struct_test.cpp
index 1c2e9b02f0..809d055beb 100644
--- a/cpp/tests/search/search_struct_test.cpp
+++ b/cpp/tests/search/search_struct_test.cpp
@@ -29,7 +29,7 @@ using int32s_col  = cudf::test::fixed_width_column_wrapper<int32_t>;
 using structs_col = cudf::test::structs_column_wrapper;
 using strings_col = cudf::test::strings_column_wrapper;
 
-constexpr bool print_all{false};  // For debugging only
+constexpr bool print_all{true};  // For debugging only
 constexpr int32_t null{0};        // Mark for null child elements
 constexpr int32_t XXX{0};         // Mark for null struct elements
 
@@ -144,6 +144,31 @@ TYPED_TEST(TypedStructSearchTest, SlicedColumnInputTests)
   CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected_upper_bound, results.second->view(), print_all);
 }
 
+TYPED_TEST(TypedStructSearchTest, UpperBoundColMismatchTest)
+{
+  using col_wrapper = cudf::test::fixed_width_column_wrapper<TypeParam, int32_t>;
+  auto too_big = cudf::test::iterator_with_null_at(2048);
+
+  auto child_col_t = col_wrapper{{100}, too_big};
+  auto const structs_t = structs_col{{child_col_t}, too_big}.release();
+
+  auto child_col_values              = col_wrapper{{0}};
+  auto const structs_values = structs_col{{child_col_values}}.release();
+
+  cudf::test::print(*structs_t);
+  cudf::test::print(*structs_values);
+
+  std::cerr << "A" << std::endl;
+  auto results   = search_bounds(structs_t, structs_values);
+  auto expected_lower_bound = int32s_col{1};
+  auto expected_upper_bound = int32s_col{1};
+  std::cerr << "B" << std::endl;
+  CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected_lower_bound, results.first->view(), print_all);
+  std::cerr << "C" << std::endl;
+  CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected_upper_bound, results.second->view(), print_all);
+  std::cerr << "D" << std::endl;
+}
+
 TYPED_TEST(TypedStructSearchTest, SimpleInputWithNullsTests)
 {
   using col_wrapper = cudf::test::fixed_width_column_wrapper<TypeParam, int32_t>;

rapids-bot bot pushed a commit that referenced this issue May 27, 2021
…lumn has null element (#8374)

Currently, struct flattening adds a validity column when the input column has a null mask (by calling to `nullable()`). In the situation when comparing two structs columns having no null but one column has a null mask, flattening them will result in two tables with different numbers of columns.

This PR fix that problem by using `has_nulls()` instead of `nullable()`. As a result, the validity column will be added to the flattening result only when the input structs column has null.

Note that when comparing two structs columns in which one column has null while the other doesn't, we must check for (nested) null existence and pass in `column_nullability::FORCE` for flattening both columns. This makes sure the flattening results are tables having the same number of columns.

Closes #8187.

Authors:
  - Nghia Truong (https://github.com/ttnghia)

Approvers:
  - David Wendt (https://github.com/davidwendt)
  - MithunR (https://github.com/mythrocks)

URL: #8374
rapids-bot bot pushed a commit that referenced this issue Jun 3, 2021
This PR adds a simple test case for struct binary search. In this test, we do binary search for 2 structs columns in which one column has a bit mask (but no null element) while the other column does not have any bit mask.

Reference: #8374 | #8187

Authors:
  - Nghia Truong (https://github.com/ttnghia)

Approvers:
  - Conor Hoekstra (https://github.com/codereport)

URL: #8396
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS
Projects
None yet
4 participants