Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] cudf::hash_join throws exception when computing inner join size of empty build table #9092

Closed
jlowe opened this issue Aug 23, 2021 · 1 comment · Fixed by #9128
Closed
Assignees
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS

Comments

@jlowe
Copy link
Member

jlowe commented Aug 23, 2021

Describe the bug
cudf::hash_join will throw an exception like the following when constructing the instance from a table with zero rows and subsequently calling the inner_join_size() method.

Steps/Code to reproduce bug
Apply the following patch and run the join tests

+TEST_F(JoinTest, HashJoinEmpty)
+{
+  CVector cols1;
+  cols1.emplace_back(column_wrapper<int32_t>{}.release());
+
+  Table t1(std::move(cols1));
+
+  cudf::hash_join hash_join(t1, cudf::null_equality::EQUAL);
+
+  {
+    CVector cols0;
+    cols0.emplace_back(column_wrapper<int32_t>{{4, 1, 2, 0, 3}}.release());
+
+    Table t0(std::move(cols0));
+
+    auto output_size                         = hash_join.inner_join_size(t0);
+    std::optional<std::size_t> optional_size = output_size;
+
+    std::size_t const size_gold = 0;
+    EXPECT_EQ(output_size, size_gold);
+
+    auto result = hash_join.inner_join(t0, cudf::null_equality::EQUAL, optional_size);
+    auto result_table =
+      cudf::table_view({cudf::column_view{cudf::data_type{cudf::type_id::INT32},
+                                          static_cast<cudf::size_type>(result.first->size()),
+                                          result.first->data()},
+                        cudf::column_view{cudf::data_type{cudf::type_id::INT32},
+                                          static_cast<cudf::size_type>(result.second->size()),
+                                          result.second->data()}});
+    auto result_sort_order = cudf::sorted_order(result_table);
+    auto sorted_result     = cudf::gather(result_table, *result_sort_order);
+
+    column_wrapper<int32_t> col_gold_0{};
+    column_wrapper<int32_t> col_gold_1{};
+
+    CVector cols_gold;
+    cols_gold.push_back(col_gold_0.release());
+    cols_gold.push_back(col_gold_1.release());
+
+    Table gold(std::move(cols_gold));
+    auto gold_sort_order = cudf::sorted_order(gold.view());
+    auto sorted_gold     = cudf::gather(gold.view(), *gold_sort_order);
+
+    CUDF_TEST_EXPECT_TABLES_EQUIVALENT(*sorted_gold, *sorted_result);
+  }
+}
+

Expected behavior
cudf::hash_join should return 0 when calling inner_join_size() when the table passed on instance construction was empty. Similarly a subsequent inner_join call should return empty gather maps.

@jlowe jlowe added bug Something isn't working Needs Triage Need team to review and classify libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS labels Aug 23, 2021
@PointKernel PointKernel self-assigned this Aug 23, 2021
@PointKernel
Copy link
Member

This issue is similar to #9044.

I will make a PR to fix both of them.

@jlowe jlowe removed the Needs Triage Need team to review and classify label Aug 23, 2021
rapids-bot bot pushed a commit that referenced this issue Aug 26, 2021
Closes #9092

This PR fixed a bug where `inner_join_size` would throw an exception if the build table is empty. Corresponding unit tests are added as well.

Authors:
  - Yunsong Wang (https://github.com/PointKernel)

Approvers:
  - Jason Lowe (https://github.com/jlowe)
  - Vukasin Milovanovic (https://github.com/vuule)

URL: #9128
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants