Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] The hashPartition API may return corrupted data when there are columns with type of DECIMAL128. #12852

Closed
firestarman opened this issue Feb 27, 2023 · 5 comments · Fixed by #12863
Assignees
Labels
bug Something isn't working

Comments

@firestarman
Copy link
Contributor

firestarman commented Feb 27, 2023

Describe the bug
The data in columns with type of DECIMAL128 may be corrupted after returned from the hashPartition API of JNI.

Steps/Code to reproduce bug
Add the test below into file "TableTest.java", and build the libcudf.

+  @Test
+  void testHashPartitionDec() {
+    long[] expDecs = new long[]{10l,20l,30l,40l,50l,60l,70l};
+    BigInteger[] bis = new BigInteger[expDecs.length];
+    for (int i = 0; i < expDecs.length; i++) {
+      bis[i] = BigInteger.valueOf(expDecs[i]);
+    }
+    try (ColumnVector key = ColumnVector.fromInts(1, 2, 3, 4, 5, 6, 7);
+         ColumnVector dec64 = ColumnVector.decimalFromLongs(0, expDecs);
+         ColumnVector dec128 = ColumnVector.decimalFromBigInt(0, bis)) {
+      try (Table input = new Table(key, dec64, dec128);
+           PartitionedTable out = input.onColumns(0).hashPartition(3)) {
+        int[] parts = out.getPartitions();
+        assertEquals(3, parts.length);
+        assertEquals(0, parts[0]);
+        try (HostColumnVector keyOut = out.getColumn(0).copyToHost();
+             HostColumnVector dec64Out = out.getColumn(1).copyToHost();
+             HostColumnVector dec128Out = out.getColumn(2).copyToHost()) {
+          for (int i = 0; i < expDecs.length; i++) {
+            int keyId = keyOut.getInt(i) - 1;
+            BigDecimal dec64Val = dec64Out.getBigDecimal(i);
+            BigDecimal dec128Val = dec128Out.getBigDecimal(i);
+            assertEquals(expDecs[keyId], dec64Val.longValue(), "Test dec64");
+            assertEquals(expDecs[keyId], dec128Val.longValue(), "Test dec128");
+          }
+        }
+      }
+    }
+  }
+

Then run the test under <cudf_root>/java.

mvn test -Dtest=ai.rapids.cudf.TableTest#testHashPartitionDec

For C++ , you can try to add an addition decimal128 column in the HashPartition-MixedColumnTypes test in file hash_partition_test.cpp, and compare output data with the input table. I do not run it because I am not familiar with how to run the c++ unit tests. But I think it can repro this bug.

diff --git a/cpp/tests/partitioning/hash_partition_test.cpp b/cpp/tests/partitioning/hash_partition_test.cpp
index 7731ad815d..73d1445b2f 100644
--- a/cpp/tests/partitioning/hash_partition_test.cpp
+++ b/cpp/tests/partitioning/hash_partition_test.cpp
@@ -118,7 +118,14 @@ TEST_F(HashPartition, MixedColumnTypes)
   fixed_width_column_wrapper<float> floats({1.f, 2.f, 3.f, 4.f, 5.f, 6.f, 7.f, 8.f});
   fixed_width_column_wrapper<int16_t> integers({1, 2, 3, 4, 5, 6, 7, 8});
   strings_column_wrapper strings({"a", "bb", "ccc", "d", "ee", "fff", "gg", "h"});
-  auto input = cudf::table_view({floats, integers, strings});
+  fixed_point_column_wrapper<__int128_t> const decimal128s(
+    {static_cast<__int128>(0),
+     static_cast<__int128>(100),
+     static_cast<__int128>(-1),
+     (static_cast<__int128>(0xFFFF'FFFF'FCC4'D1C3u) << 64 | 0x602F'7FC3'1800'0001u),
+     (static_cast<__int128>(0x0785'EE10'D5DA'46D9u) << 64 | 0x00F4'369F'FFFF'FFFFu)},
+    numeric::scale_type{-10});
+  auto input = cudf::table_view({floats, integers, strings, decimal128s});
 
   auto columns_to_hash = std::vector<cudf::size_type>({0, 2});

Expected behavior
the hashPartition should work well with DECIMAL128 data.

Additional context
I tried the Java test locally and always get

[INFO] 
[INFO] Results:
[INFO] 
[ERROR] Failures: 
[ERROR]   TableTest.testHashPartitionDec:3251 Test dec128 ==> expected: <10> but was: <0>
[INFO] 
[ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0
[INFO] 
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 12.828s
[INFO] Finished at: Mon Feb 27 07:14:53 UTC 2023
[INFO] Final Memory: 33M/791M
[INFO] ------------------------------------------------------------------------
@revans2
Copy link
Contributor

revans2 commented Feb 28, 2023

@davidwendt the C++ test does not compile and then also fails for the wrong reasons right now. I am working on fixing it, but you might be able to fix it faster than I can.

@revans2
Copy link
Contributor

revans2 commented Feb 28, 2023

@firestarman when I fixed the C++ test it does not fail.

diff --git a/cpp/tests/partitioning/hash_partition_test.cpp b/cpp/tests/partitioning/hash_partition_test.cpp
index 7731ad815d..c4ea2b5c06 100644
--- a/cpp/tests/partitioning/hash_partition_test.cpp
+++ b/cpp/tests/partitioning/hash_partition_test.cpp
@@ -29,6 +29,7 @@
 #include <thrust/iterator/transform_iterator.h>
 
 using cudf::test::fixed_width_column_wrapper;
+using cudf::test::fixed_point_column_wrapper;
 using cudf::test::strings_column_wrapper;
 using structs_col = cudf::test::structs_column_wrapper;
 
@@ -118,7 +119,17 @@ TEST_F(HashPartition, MixedColumnTypes)
   fixed_width_column_wrapper<float> floats({1.f, 2.f, 3.f, 4.f, 5.f, 6.f, 7.f, 8.f});
   fixed_width_column_wrapper<int16_t> integers({1, 2, 3, 4, 5, 6, 7, 8});
   strings_column_wrapper strings({"a", "bb", "ccc", "d", "ee", "fff", "gg", "h"});
-  auto input = cudf::table_view({floats, integers, strings});
+  fixed_point_column_wrapper<__int128_t> const decimal128s(
+    {static_cast<__int128>(0),
+     static_cast<__int128>(100),
+     static_cast<__int128>(-1),
+     (static_cast<__int128>(0xFFFF'FFFF'FCC4'D1C3u) << 64 | 0x602F'7FC3'1800'0001u),
+     (static_cast<__int128>(0x0785'EE10'D5DA'46D9u) << 64 | 0x00F4'369F'FFFF'FFFFu),
+     static_cast<__int128>(1),
+     static_cast<__int128>(2),
+     static_cast<__int128>(3)},
+    numeric::scale_type{-10});
+  auto input = cudf::table_view({floats, integers, strings, decimal128s});
 
   auto columns_to_hash = std::vector<cudf::size_type>({0, 2});

I am going to try to replicate your java test more accurately

@davidwendt
Copy link
Contributor

I was able to piece together the issue from both examples to recreate the error. Working on a fix now.

@revans2
Copy link
Contributor

revans2 commented Feb 28, 2023

@davidwendt thanks for doing that

@firestarman
Copy link
Contributor Author

firestarman commented Mar 1, 2023

@firestarman when I fixed the C++ test it does not fail.

@revans2 Because the test does not compare the output with the input table, seems it is not easy to do this in C++. Since hash may change the rows order.
In the Java test, the keys are also used as the index to get the corresponding row for each comparison.

rapids-bot bot pushed a commit that referenced this issue Mar 7, 2023
…12863)

Fixes `cudf::hash_partition` error when using `decimal128` column types. The internal optimized path, `copy_block_partitions`, uses shared-memory for copying fixed-width type column elements. For `int128_t` type, the shared-memory needed (~64KB) is larger than the maximum size (~48KB) allowed causing a kernel launch failure. 

The optimized path is now restricted to only fixed-width types `int64_t` and below. The `int128_t` column types will fall through to the gather-map pattern instead. Accommodating this type in the existing copy-block implementation would likely penalize the performance of the other fixed-width types. If the new implementation becomes insufficient, we could explore a special optimized path in the future for the single type `int128_t`.

An existing gtest for fixed-point types was updated to include a `decimal128` column to catch this kind of error in the future.

Closes #12852

Authors:
  - David Wendt (https://github.com/davidwendt)
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Nghia Truong (https://github.com/ttnghia)
  - Divye Gala (https://github.com/divyegala)
  - Bradley Dice (https://github.com/bdice)

URL: #12863
@bdice bdice removed the Needs Triage Need team to review and classify label Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants