-
Notifications
You must be signed in to change notification settings - Fork 915
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add row bitmask as a
detail::hash_join
member (#10248)
When working on #8934, we observed a performance regression when nulls are unequal. One major reason is that the new hash map uses a CG-based double hashing algorithm. This algorithm is dedicated to improving hash collision handling. The existing implementation determines hash map size by the number of rows in the build table regardless of how many rows are valid. In the case of nulls being unequal, the actual map occupancy is, therefore, lower than the default 50% thus resulting in fewer hash collisions. The old scalar linear probing is more efficient in this case due to less CG-related overhead and the probe will mostly end at the first probe slot. To improve this situation, the original idea of this PR was to construct the hash map based on the number of valid rows. There are supposed to be two benefits: 1. Increases map occupancy to benefit more from CG-based double hashing thus improving runtime efficiency 2. Reduces peak memory usage: for 1'000 elements with 75% nulls, the new capacity would be 500 (1000 * 0.25 * 2) as opposed to 2000 (1000 * 2) During this work, however, we noticed the first assumption is improper since it didn't consider the performance degradation along with reduced capacity (see #10248 (comment)). Though this effort will reduce peak memory usage, it seems Python/Spark workflows would never benefit from it since they tend to drop nulls before any join operations. Finally, all changes related to map size reduction are discarded. This PR only adds `_composite_bitmask` as a `detail::hash_join` member which is a preparation step for #9151 Authors: - Yunsong Wang (https://github.com/PointKernel) Approvers: - Karthikeyan (https://github.com/karthikeyann) - Vyas Ramasubramani (https://github.com/vyasr) URL: #10248
- Loading branch information
1 parent
6128e0d
commit 0ddb3d9
Showing
4 changed files
with
20 additions
and
8 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters