[BUG] The `hashPartition` API may return corrupted data when there are columns with type of `DECIMAL128`. #12852

firestarman · 2023-02-27T07:36:17Z

Describe the bug
The data in columns with type of DECIMAL128 may be corrupted after returned from the hashPartition API of JNI.

Steps/Code to reproduce bug
Add the test below into file "TableTest.java", and build the libcudf.

+  @Test
+  void testHashPartitionDec() {
+    long[] expDecs = new long[]{10l,20l,30l,40l,50l,60l,70l};
+    BigInteger[] bis = new BigInteger[expDecs.length];
+    for (int i = 0; i < expDecs.length; i++) {
+      bis[i] = BigInteger.valueOf(expDecs[i]);
+    }
+    try (ColumnVector key = ColumnVector.fromInts(1, 2, 3, 4, 5, 6, 7);
+         ColumnVector dec64 = ColumnVector.decimalFromLongs(0, expDecs);
+         ColumnVector dec128 = ColumnVector.decimalFromBigInt(0, bis)) {
+      try (Table input = new Table(key, dec64, dec128);
+           PartitionedTable out = input.onColumns(0).hashPartition(3)) {
+        int[] parts = out.getPartitions();
+        assertEquals(3, parts.length);
+        assertEquals(0, parts[0]);
+        try (HostColumnVector keyOut = out.getColumn(0).copyToHost();
+             HostColumnVector dec64Out = out.getColumn(1).copyToHost();
+             HostColumnVector dec128Out = out.getColumn(2).copyToHost()) {
+          for (int i = 0; i < expDecs.length; i++) {
+            int keyId = keyOut.getInt(i) - 1;
+            BigDecimal dec64Val = dec64Out.getBigDecimal(i);
+            BigDecimal dec128Val = dec128Out.getBigDecimal(i);
+            assertEquals(expDecs[keyId], dec64Val.longValue(), "Test dec64");
+            assertEquals(expDecs[keyId], dec128Val.longValue(), "Test dec128");
+          }
+        }
+      }
+    }
+  }
+

Then run the test under <cudf_root>/java.

mvn test -Dtest=ai.rapids.cudf.TableTest#testHashPartitionDec

For C++ , you can try to add an addition decimal128 column in the HashPartition-MixedColumnTypes test in file hash_partition_test.cpp, and compare output data with the input table. I do not run it because I am not familiar with how to run the c++ unit tests. But I think it can repro this bug.

diff --git a/cpp/tests/partitioning/hash_partition_test.cpp b/cpp/tests/partitioning/hash_partition_test.cpp
index 7731ad815d..73d1445b2f 100644
--- a/cpp/tests/partitioning/hash_partition_test.cpp
+++ b/cpp/tests/partitioning/hash_partition_test.cpp
@@ -118,7 +118,14 @@ TEST_F(HashPartition, MixedColumnTypes)
   fixed_width_column_wrapper<float> floats({1.f, 2.f, 3.f, 4.f, 5.f, 6.f, 7.f, 8.f});
   fixed_width_column_wrapper<int16_t> integers({1, 2, 3, 4, 5, 6, 7, 8});
   strings_column_wrapper strings({"a", "bb", "ccc", "d", "ee", "fff", "gg", "h"});
-  auto input = cudf::table_view({floats, integers, strings});
+  fixed_point_column_wrapper<__int128_t> const decimal128s(
+    {static_cast<__int128>(0),
+     static_cast<__int128>(100),
+     static_cast<__int128>(-1),
+     (static_cast<__int128>(0xFFFF'FFFF'FCC4'D1C3u) << 64 | 0x602F'7FC3'1800'0001u),
+     (static_cast<__int128>(0x0785'EE10'D5DA'46D9u) << 64 | 0x00F4'369F'FFFF'FFFFu)},
+    numeric::scale_type{-10});
+  auto input = cudf::table_view({floats, integers, strings, decimal128s});
 
   auto columns_to_hash = std::vector<cudf::size_type>({0, 2});

Expected behavior
the hashPartition should work well with DECIMAL128 data.

Additional context
I tried the Java test locally and always get

[INFO] 
[INFO] Results:
[INFO] 
[ERROR] Failures: 
[ERROR]   TableTest.testHashPartitionDec:3251 Test dec128 ==> expected: <10> but was: <0>
[INFO] 
[ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0
[INFO] 
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 12.828s
[INFO] Finished at: Mon Feb 27 07:14:53 UTC 2023
[INFO] Final Memory: 33M/791M
[INFO] ------------------------------------------------------------------------

The text was updated successfully, but these errors were encountered:

revans2 · 2023-02-28T16:27:47Z

@davidwendt the C++ test does not compile and then also fails for the wrong reasons right now. I am working on fixing it, but you might be able to fix it faster than I can.

revans2 · 2023-02-28T16:31:16Z

@firestarman when I fixed the C++ test it does not fail.

diff --git a/cpp/tests/partitioning/hash_partition_test.cpp b/cpp/tests/partitioning/hash_partition_test.cpp
index 7731ad815d..c4ea2b5c06 100644
--- a/cpp/tests/partitioning/hash_partition_test.cpp
+++ b/cpp/tests/partitioning/hash_partition_test.cpp
@@ -29,6 +29,7 @@
 #include <thrust/iterator/transform_iterator.h>
 
 using cudf::test::fixed_width_column_wrapper;
+using cudf::test::fixed_point_column_wrapper;
 using cudf::test::strings_column_wrapper;
 using structs_col = cudf::test::structs_column_wrapper;
 
@@ -118,7 +119,17 @@ TEST_F(HashPartition, MixedColumnTypes)
   fixed_width_column_wrapper<float> floats({1.f, 2.f, 3.f, 4.f, 5.f, 6.f, 7.f, 8.f});
   fixed_width_column_wrapper<int16_t> integers({1, 2, 3, 4, 5, 6, 7, 8});
   strings_column_wrapper strings({"a", "bb", "ccc", "d", "ee", "fff", "gg", "h"});
-  auto input = cudf::table_view({floats, integers, strings});
+  fixed_point_column_wrapper<__int128_t> const decimal128s(
+    {static_cast<__int128>(0),
+     static_cast<__int128>(100),
+     static_cast<__int128>(-1),
+     (static_cast<__int128>(0xFFFF'FFFF'FCC4'D1C3u) << 64 | 0x602F'7FC3'1800'0001u),
+     (static_cast<__int128>(0x0785'EE10'D5DA'46D9u) << 64 | 0x00F4'369F'FFFF'FFFFu),
+     static_cast<__int128>(1),
+     static_cast<__int128>(2),
+     static_cast<__int128>(3)},
+    numeric::scale_type{-10});
+  auto input = cudf::table_view({floats, integers, strings, decimal128s});
 
   auto columns_to_hash = std::vector<cudf::size_type>({0, 2});

I am going to try to replicate your java test more accurately

davidwendt · 2023-02-28T16:32:09Z

I was able to piece together the issue from both examples to recreate the error. Working on a fix now.

revans2 · 2023-02-28T16:36:49Z

@davidwendt thanks for doing that

firestarman · 2023-03-01T01:57:47Z

@firestarman when I fixed the C++ test it does not fail.

@revans2 Because the test does not compare the output with the input table, seems it is not easy to do this in C++. Since hash may change the rows order.
In the Java test, the keys are also used as the index to get the corresponding row for each comparison.

…12863) Fixes `cudf::hash_partition` error when using `decimal128` column types. The internal optimized path, `copy_block_partitions`, uses shared-memory for copying fixed-width type column elements. For `int128_t` type, the shared-memory needed (~64KB) is larger than the maximum size (~48KB) allowed causing a kernel launch failure. The optimized path is now restricted to only fixed-width types `int64_t` and below. The `int128_t` column types will fall through to the gather-map pattern instead. Accommodating this type in the existing copy-block implementation would likely penalize the performance of the other fixed-width types. If the new implementation becomes insufficient, we could explore a special optimized path in the future for the single type `int128_t`. An existing gtest for fixed-point types was updated to include a `decimal128` column to catch this kind of error in the future. Closes #12852 Authors: - David Wendt (https://github.com/davidwendt) - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Nghia Truong (https://github.com/ttnghia) - Divye Gala (https://github.com/divyegala) - Bradley Dice (https://github.com/bdice) URL: #12863

firestarman added bug Something isn't working Needs Triage Need team to review and classify labels Feb 27, 2023

This was referenced Feb 27, 2023

[BUG] GPU's output differs from CPU's for big decimals when joining by sub-partitioning algorithm NVIDIA/spark-rapids#7814

Closed

Implement sub partitioning for large/skewed hash joins NVIDIA/spark-rapids#7794

Merged

davidwendt self-assigned this Feb 28, 2023

davidwendt mentioned this issue Feb 28, 2023

Fix cudf::hash_partition kernel launch error with decimal128 types #12863

Merged

3 tasks

firestarman mentioned this issue Mar 1, 2023

[FEA] Improve the sub-partitioning algorithm for large/skewed hash joins NVIDIA/spark-rapids#7832

Closed

7 tasks

rapids-bot bot closed this as completed in #12863 Mar 7, 2023

bdice removed the Needs Triage Need team to review and classify label Mar 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] The `hashPartition` API may return corrupted data when there are columns with type of `DECIMAL128`. #12852

[BUG] The `hashPartition` API may return corrupted data when there are columns with type of `DECIMAL128`. #12852

firestarman commented Feb 27, 2023 •

edited

Loading

revans2 commented Feb 28, 2023

revans2 commented Feb 28, 2023

davidwendt commented Feb 28, 2023

revans2 commented Feb 28, 2023

firestarman commented Mar 1, 2023 •

edited

Loading

[BUG] The hashPartition API may return corrupted data when there are columns with type of DECIMAL128. #12852

[BUG] The hashPartition API may return corrupted data when there are columns with type of DECIMAL128. #12852

Comments

firestarman commented Feb 27, 2023 • edited Loading

revans2 commented Feb 28, 2023

revans2 commented Feb 28, 2023

davidwendt commented Feb 28, 2023

revans2 commented Feb 28, 2023

firestarman commented Mar 1, 2023 • edited Loading

[BUG] The `hashPartition` API may return corrupted data when there are columns with type of `DECIMAL128`. #12852

[BUG] The `hashPartition` API may return corrupted data when there are columns with type of `DECIMAL128`. #12852

firestarman commented Feb 27, 2023 •

edited

Loading

firestarman commented Mar 1, 2023 •

edited

Loading