-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add row bitmask as a detail::hash_join
member
#10248
Add row bitmask as a detail::hash_join
member
#10248
Conversation
346433b
to
78e855f
Compare
Codecov Report
@@ Coverage Diff @@
## branch-22.06 #10248 +/- ##
================================================
+ Coverage 86.40% 86.43% +0.02%
================================================
Files 143 143
Lines 22444 22444
================================================
+ Hits 19393 19399 +6
+ Misses 3051 3045 -6
Continue to review full report at Codecov.
|
@jrhemstad Here are benchmark results of
If occupancy is fixed at 50%, double hashing outperforms linear probing by ~27%. When we increase the hash map capacity to have an actual occupancy of 25%, linear probing is 5% more efficient. This is consistent with the result we get with this PR. In hash join benchmarks where nulls are unequal and 75% are nulls, the actual occupancy is 25%. With this PR, using |
The peak bytes used is about 2.5x to 3.7x less. Unlike |
@PointKernel what is the percentage of nulls in that test? |
@jrhemstad It's 75% nulls for both probe and build tables. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that I'm missing some context. What motivated this change? Is there a particular reason that we're willing to increase the runtime in order to decrease the memory footprint? The code looks good and the behavior aligns with expectations, but I don't know why we want this.
See #9151 We'd actually hoped this would improve runtime. The fact that it reduces runtime was an unfortunate surprise, so now we have to decide what we want to do with it. |
@PointKernel what (if any) is the impact on runtime when very few nulls are present? What about if no nulls are present? Is the overhead from constructing the composite row bitmask? Or is it from increasing the relative load factor on the hash map by using a smaller capacity? (or both?) |
The fewer nulls are present, the less performance loss we get. i.e. there is no performance loss if no nulls are present.
No, it's not. We are constructing the row bitmask anyway when nulls are present since it's being used by
Yes, that's the reason. If we want to keep the current performance, I will suggest setting the hashmap size based on the number of rows in the input table regardless of whether nulls are present. That is, the only change in this PR is to add a new row bitmask member in hash join so it can be used for |
detail::hash_join
member
@gpucibot merge |
When working on #8934, we observed a performance regression when nulls are unequal. One major reason is that the new hash map uses a CG-based double hashing algorithm. This algorithm is dedicated to improving hash collision handling. The existing implementation determines hash map size by the number of rows in the build table regardless of how many rows are valid. In the case of nulls being unequal, the actual map occupancy is, therefore, lower than the default 50% thus resulting in fewer hash collisions. The old scalar linear probing is more efficient in this case due to less CG-related overhead and the probe will mostly end at the first probe slot.
To improve this situation, the original idea of this PR was to construct the hash map based on the number of valid rows. There are supposed to be two benefits:
During this work, however, we noticed the first assumption is improper since it didn't consider the performance degradation along with reduced capacity (see #10248 (comment)). Though this effort will reduce peak memory usage, it seems Python/Spark workflows would never benefit from it since they tend to drop nulls before any join operations.
Finally, all changes related to map size reduction are discarded. This PR only adds
_composite_bitmask
as adetail::hash_join
member which is a preparation step for #9151