Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support hashing different types for sub-partitioning #7937

Merged
merged 3 commits into from
Mar 27, 2023

Conversation

firestarman
Copy link
Collaborator

@firestarman firestarman commented Mar 27, 2023

Relevant to rapidsai/cudf#13000

This PR changes to leverage the Plugin way to partition batches instead of the cudf hashPartition in sub-partitioning.

Because the left and right join keys with the same value but different integral types(e.g. 100.toInt and 100.toByte) will produce different hash values by cudf hashPartition, breaking the join by sub-partitioning algorithm.

An integer and byte will produce different hash values even they have the same
value, but we want the same hash for them.

So switch to the Plugin way to partition batches.

Signed-off-by: Liangcai Li <[email protected]>
Signed-off-by: Liangcai Li <[email protected]>
@firestarman firestarman requested a review from revans2 March 27, 2023 07:02
Signed-off-by: Firestarman <[email protected]>
@firestarman
Copy link
Collaborator Author

build

@revans2 revans2 merged commit 8ed55e2 into NVIDIA:branch-23.04 Mar 27, 2023
@firestarman firestarman deleted the fix-sub-hash branch March 28, 2023 01:41
@sameerz sameerz added the performance A performance related task/issue label Mar 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance A performance related task/issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants