-
Notifications
You must be signed in to change notification settings - Fork 920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Two tables with the same key may be placed at different partitions by Table.hashPartition
.
#13000
Comments
The type of Using |
Thanks for the info. @revans2 What's your idea about this ? Are we good to use This is from the bug #4039216 by QA. |
@firestarman in the short term I think it would be good to use the same hashing code that the plugin uses for HashPartitioning. But with a different hash to avoid collisions. We know that this code works and that all of the odd corner cases have been covered. |
Describe the bug
Read the two files (on google drive and links are posted at the end of this section) into two tables individually, and you can find all the rows in the two tables have the same value (100) in the first column "byteF".
When hash partitioning the two tables by the
byteF
column with the same hash function, partition number (16) and seed, we expect the two tables are in the same partition, but they are not. One is at partition 10, while the other is at 13. And the real partitions are as below.I failed to figure out a smaller dataset, so had to upload the files to google drive.
https://drive.google.com/file/d/1FtMlmeNDK2NM8wjjlkDuYkUR-1CyPD9p/view?usp=share_link
https://drive.google.com/file/d/1xg-JPIRgNv-a7FvIvEH4DqpjhUaZ778s/view?usp=share_link
Steps/Code to reproduce bug
Download the two parquet files above and put them under your local folder, e.g. "/data/tmp".
Add the below test into TableTest and build the native cudf library,
then run
Expected behavior
The test should pass.
Environment details
OS Information
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.3 LTS"
NAME="Ubuntu"
VERSION="20.04.3 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.3 LTS"
The text was updated successfully, but these errors were encountered: