-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] The hashPartition
API may return corrupted data when there are columns with type of DECIMAL128
.
#12852
Comments
@davidwendt the C++ test does not compile and then also fails for the wrong reasons right now. I am working on fixing it, but you might be able to fix it faster than I can. |
@firestarman when I fixed the C++ test it does not fail.
I am going to try to replicate your java test more accurately |
I was able to piece together the issue from both examples to recreate the error. Working on a fix now. |
@davidwendt thanks for doing that |
@revans2 Because the test does not compare the output with the input table, seems it is not easy to do this in C++. Since hash may change the rows order. |
…12863) Fixes `cudf::hash_partition` error when using `decimal128` column types. The internal optimized path, `copy_block_partitions`, uses shared-memory for copying fixed-width type column elements. For `int128_t` type, the shared-memory needed (~64KB) is larger than the maximum size (~48KB) allowed causing a kernel launch failure. The optimized path is now restricted to only fixed-width types `int64_t` and below. The `int128_t` column types will fall through to the gather-map pattern instead. Accommodating this type in the existing copy-block implementation would likely penalize the performance of the other fixed-width types. If the new implementation becomes insufficient, we could explore a special optimized path in the future for the single type `int128_t`. An existing gtest for fixed-point types was updated to include a `decimal128` column to catch this kind of error in the future. Closes #12852 Authors: - David Wendt (https://github.com/davidwendt) - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Nghia Truong (https://github.com/ttnghia) - Divye Gala (https://github.com/divyegala) - Bradley Dice (https://github.com/bdice) URL: #12863
Describe the bug
The data in columns with type of
DECIMAL128
may be corrupted after returned from thehashPartition
API of JNI.Steps/Code to reproduce bug
Add the test below into file "TableTest.java", and build the libcudf.
Then run the test under
<cudf_root>/java
.For C++ , you can try to add an addition decimal128 column in the
HashPartition-MixedColumnTypes
test in filehash_partition_test.cpp
, and compare output data with the input table. I do not run it because I am not familiar with how to run the c++ unit tests. But I think it can repro this bug.Expected behavior
the
hashPartition
should work well withDECIMAL128
data.Additional context
I tried the Java test locally and always get
The text was updated successfully, but these errors were encountered: