-
Notifications
You must be signed in to change notification settings - Fork 915
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Do not add nulls to the hash table when null_equality::NOT_EQUAL is p…
…assed to left_semi_join and left_anti_join (#8277) Fixes #7300 This is fundamentally the same issue and fix as https://github.com/rapidsai/cudf/pull/6943/files from @hyperbolic2346 When nulls are considered not equal (`null_equality::NOT_EQUAL`) there is no point in adding them to the hash table used for the join as they will never compare as true against anything. Adding large numbers of nulls was causing huge performance issues. Includes a fix to doxygen comments for `left_anti_join` Performance gain is tremendous. Before: ``` Benchmark Time CPU Iterations ----------------------------------------------------------------------------------------------------------------------- Join<int32_t, int32_t>/left_anti_join_32bit_nulls/100000/100000/manual_time 1072 ms 1072 ms 1 Join<int32_t, int32_t>/left_anti_join_32bit_nulls/200000/400000/manual_time 4253 ms 4253 ms 1 Join<int32_t, int32_t>/left_anti_join_32bit_nulls/300000/1000000/manual_time 14016 ms 14016 ms 1 Join<int32_t, int32_t>/left_semi_join_32bit_nulls/100000/100000/manual_time 932 ms 932 ms 1 Join<int32_t, int32_t>/left_semi_join_32bit_nulls/200000/400000/manual_time 4481 ms 4481 ms 1 Join<int32_t, int32_t>/left_semi_join_32bit_nulls/300000/1000000/manual_time 14172 ms 14172 ms 1 ``` After: ``` ----------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ----------------------------------------------------------------------------------------------------------------------- Join<int32_t, int32_t>/left_anti_join_32bit_nulls/100000/100000/manual_time 0.143 ms 0.162 ms 4996 Join<int32_t, int32_t>/left_anti_join_32bit_nulls/200000/400000/manual_time 0.255 ms 0.275 ms 2780 Join<int32_t, int32_t>/left_anti_join_32bit_nulls/300000/1000000/manual_time 0.514 ms 0.532 ms 1368 Join<int32_t, int32_t>/left_semi_join_32bit_nulls/100000/100000/manual_time 0.135 ms 0.155 ms 5203 Join<int32_t, int32_t>/left_semi_join_32bit_nulls/200000/400000/manual_time 0.206 ms 0.224 ms 3325 Join<int32_t, int32_t>/left_semi_join_32bit_nulls/300000/1000000/manual_time 0.368 ms 0.385 ms 1903 ``` Authors: - https://github.com/nvdbaranec Approvers: - Jake Hemstad (https://github.com/jrhemstad) - Mike Wilson (https://github.com/hyperbolic2346) - Robert Maynard (https://github.com/robertmaynard) - Mark Harris (https://github.com/harrism) URL: #8277
- Loading branch information
1 parent
691dd11
commit 7e725b5
Showing
6 changed files
with
365 additions
and
124 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.