Add row bitmask as a `detail::hash_join` member #10248

PointKernel · 2022-02-08T16:47:23Z

When working on #8934, we observed a performance regression when nulls are unequal. One major reason is that the new hash map uses a CG-based double hashing algorithm. This algorithm is dedicated to improving hash collision handling. The existing implementation determines hash map size by the number of rows in the build table regardless of how many rows are valid. In the case of nulls being unequal, the actual map occupancy is, therefore, lower than the default 50% thus resulting in fewer hash collisions. The old scalar linear probing is more efficient in this case due to less CG-related overhead and the probe will mostly end at the first probe slot.

To improve this situation, the original idea of this PR was to construct the hash map based on the number of valid rows. There are supposed to be two benefits:

Increases map occupancy to benefit more from CG-based double hashing thus improving runtime efficiency
Reduces peak memory usage: for 1'000 elements with 75% nulls, the new capacity would be 500 (1000 * 0.25 * 2) as opposed to 2000 (1000 * 2)

During this work, however, we noticed the first assumption is improper since it didn't consider the performance degradation along with reduced capacity (see #10248 (comment)). Though this effort will reduce peak memory usage, it seems Python/Spark workflows would never benefit from it since they tend to drop nulls before any join operations.

Finally, all changes related to map size reduction are discarded. This PR only adds _composite_bitmask as a detail::hash_join member which is a preparation step for #9151

cpp/src/join/hash_join.cu

…h-join

codecov · 2022-02-12T01:27:32Z

Codecov Report

Merging #10248 (7b2f5f6) into branch-22.06 (84f88ce) will increase coverage by 0.02%.
The diff coverage is n/a.

@@               Coverage Diff                @@
##           branch-22.06   #10248      +/-   ##
================================================
+ Coverage         86.40%   86.43%   +0.02%     
================================================
  Files               143      143              
  Lines             22444    22444              
================================================
+ Hits              19393    19399       +6     
+ Misses             3051     3045       -6

Impacted Files	Coverage Δ
python/cudf/cudf/comm/gpuarrow.py	`79.76% <ø> (ø)`
python/cudf/cudf/core/column/string.py	`89.21% <ø> (+0.12%)`	⬆️
python/cudf/cudf/core/frame.py	`93.41% <ø> (ø)`
python/cudf/cudf/core/series.py	`95.16% <ø> (ø)`
python/cudf/cudf/core/dataframe.py	`93.74% <0.00%> (+0.04%)`	⬆️
python/cudf/cudf/core/groupby/groupby.py	`91.79% <0.00%> (+0.22%)`	⬆️
python/cudf/cudf/core/tools/datetimes.py	`84.49% <0.00%> (+0.30%)`	⬆️
python/cudf/cudf/core/column/lists.py	`92.91% <0.00%> (+0.83%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3c208a6...7b2f5f6. Read the comment docs.

PointKernel · 2022-02-18T21:25:00Z

@jrhemstad Here are benchmark results of cuco::static_multimap::pair_retrieve:

Key	Value	Multiplicity	Scheme	NumInputs	Occupancy	Capacity	Memory (GB)	GPU Time (ms)
I32	I32	2	Linear Probing	100 M	0.5	200 M	1.6	53.88
I32	I32	2	Double Hashing	100 M	0.5	200 M	1.6	40.83
I32	I32	2	Linear Probing	100 M	0.25	400 M	3.2	38.40

If occupancy is fixed at 50%, double hashing outperforms linear probing by ~27%. When we increase the hash map capacity to have an actual occupancy of 25%, linear probing is 5% more efficient. This is consistent with the result we get with this PR.

In hash join benchmarks where nulls are unequal and 75% are nulls, the actual occupancy is 25%. With this PR, using bitmask_and to determine the hash table capacity can ignore null elements thus improve the occupancy to 50%. However, this will not make the execution faster since double hashing with 50% occupancy cannot beat linear probing with 25% occupancy.

…h-join

PointKernel · 2022-03-29T15:01:13Z

benchmark shows that this effort makes the execution 5~10% slower by requiring 25% of the memory

Can you elaborate a bit on the memory reduction? This just means the hash table is 25% of the size it would otherwise need to be, right? So in terms of total memory needed for the join operation, the reduction is less, right?

inner_join_32bit_nulls benchmark results by adding the peak memory usage measurement:

Baseline

Key Type	Payload Type	Nullable	Build Table Size	Probe Table Size	Samples	CPU Time	GPU Time	Peak Memory
I32	I32	1	100000000	100000000	12x	45.032 ms	45.027 ms	1617404728
I32	I32	1	10000000	240000000	11x	58.151 ms	58.146 ms	199587032
I32	I32	1	80000000	240000000	11x	73.723 ms	73.718 ms	1319627976
I32	I32	1	100000000	240000000	11x	78.108 ms	78.103 ms	1638403320

Optimization:

Key Type	Payload Type	Nullable	Build Table Size	Probe Table Size	Samples	CPU Time	GPU Time	Peak Memory
I32	I32	1	100000000	100000000	11x	49.535 ms	49.529 ms	430194360
I32	I32	1	10000000	240000000	11x	70.833 ms	70.828 ms	79186840
I32	I32	1	80000000	240000000	11x	87.539 ms	87.534 ms	369024712
I32	I32	1	100000000	240000000	11x	91.814 ms	91.809 ms	451192952

The peak bytes used is about 2.5x to 3.7x less. Unlike inner_join_64bit_nulls, the optimized implementation can be 20% slower when dealing with large probe tables.

jrhemstad · 2022-03-29T19:11:11Z

@PointKernel what is the percentage of nulls in that test?

PointKernel · 2022-03-29T19:13:23Z

@PointKernel what is the percentage of nulls in that test?

@jrhemstad It's 75% nulls for both probe and build tables.

vyasr

I think that I'm missing some context. What motivated this change? Is there a particular reason that we're willing to increase the runtime in order to decrease the memory footprint? The code looks good and the behavior aligns with expectations, but I don't know why we want this.

cpp/src/join/hash_join.cu

jrhemstad · 2022-03-31T13:29:02Z

Is there a particular reason that we're willing to increase the runtime in order to decrease the memory footprint?

See #9151

We'd actually hoped this would improve runtime. The fact that it reduces runtime was an unfortunate surprise, so now we have to decide what we want to do with it.

jrhemstad · 2022-03-31T13:32:54Z

@PointKernel what (if any) is the impact on runtime when very few nulls are present? What about if no nulls are present?

Is the overhead from constructing the composite row bitmask? Or is it from increasing the relative load factor on the hash map by using a smaller capacity? (or both?)

…h-join

PointKernel · 2022-04-13T23:07:14Z

what (if any) is the impact on runtime when very few nulls are present? What about if no nulls are present?

The fewer nulls are present, the less performance loss we get. i.e. there is no performance loss if no nulls are present.

Is the overhead from constructing the composite row bitmask?

No, it's not. We are constructing the row bitmask anyway when nulls are present since it's being used by insert_if.

Or is it from increasing the relative load factor on the hash map by using a smaller capacity?

Yes, that's the reason.

If we want to keep the current performance, I will suggest setting the hashmap size based on the number of rows in the input table regardless of whether nulls are present. That is, the only change in this PR is to add a new row bitmask member in hash join so it can be used for insert_if and eventually count_if and retrieve_if in a PR resolving #9151.

…h-join

PointKernel · 2022-05-02T21:12:11Z

@gpucibot merge

Create hash table by ignoring nulls

4aebb24

PointKernel added libcudf Affects libcudf (C++/CUDA) code. 0 - Blocked Cannot progress due to external reasons Performance Performance related issue 5 - Merge After Dependencies improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Feb 8, 2022

PointKernel self-assigned this Feb 8, 2022

jrhemstad reviewed Feb 8, 2022

View reviewed changes

cpp/src/join/hash_join.cu Outdated Show resolved Hide resolved

PointKernel added 3 commits February 11, 2022 12:58

Merge remote-tracking branch 'upstream/branch-22.04' into improve-has…

281cbb4

…h-join

Use valid row logic to reduce hash table size

2e0825a

Update copyright year

78e855f

PointKernel force-pushed the improve-hash-join branch from 346433b to 78e855f Compare February 12, 2022 00:05

PointKernel mentioned this pull request Feb 12, 2022

Improve hash join detail functions #10273

Merged

PointKernel removed 0 - Blocked Cannot progress due to external reasons 5 - Merge After Dependencies labels Feb 18, 2022

PointKernel added 3 commits February 22, 2022 15:05

Merge remote-tracking branch 'upstream/branch-22.04' into improve-has…

6eb855f

…h-join

Minor cleanup

a38cb5c

Update docs

8082a4a

PointKernel added the 3 - Ready for Review Ready for review by team label Feb 22, 2022

PointKernel marked this pull request as ready for review February 22, 2022 21:55

PointKernel requested a review from a team as a code owner February 22, 2022 21:55

PointKernel requested review from trxcllnt, karthikeyann and jrhemstad February 22, 2022 21:55

PointKernel changed the title ~~Improve hash join~~ Improve hash join when nulls are unequal Feb 22, 2022

Add peak memory measurement in join benchmarks

b1cd094

Minor improvement: use helper struct instead of std::pair

3c3aa22

PointKernel requested a review from vyasr March 29, 2022 15:25

vyasr mentioned this pull request Mar 29, 2022

Add peak_memory_usage to all nvbench benchmarks #10528

Open

vyasr reviewed Mar 31, 2022

View reviewed changes

cpp/src/join/hash_join.cu Outdated Show resolved Hide resolved

PointKernel added 4 commits March 31, 2022 16:03

Merge remote-tracking branch 'upstream/branch-22.06' into improve-has…

37d35bd

…h-join

Minor code cleanup

e76cd75

Fix a typo

50e7dff

Merge remote-tracking branch 'upstream/branch-22.06' into improve-has…

c5202eb

…h-join

PointKernel added 3 commits April 29, 2022 10:48

Revert changes

9667491

Merge remote-tracking branch 'upstream/branch-22.06' into improve-has…

92e5527

…h-join

Add bitmask as hash join member

7b2f5f6

PointKernel changed the title ~~Improve hash join when nulls are unequal~~ Add row bitmask as a detail::hash_join member Apr 29, 2022

PointKernel requested a review from vyasr April 29, 2022 19:55

PointKernel added tech debt and removed Performance Performance related issue labels Apr 29, 2022

karthikeyann approved these changes May 2, 2022

View reviewed changes

vyasr approved these changes May 2, 2022

View reviewed changes

rapids-bot bot merged commit 0ddb3d9 into rapidsai:branch-22.06 May 2, 2022

PointKernel deleted the improve-hash-join branch May 2, 2022 21:12

vyasr mentioned this pull request Jul 15, 2022

[FEA] Filter join probe table rows that contain nulls when nulls are not equal #9151

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add row bitmask as a `detail::hash_join` member #10248

Add row bitmask as a `detail::hash_join` member #10248

PointKernel commented Feb 8, 2022 •

edited

Loading

codecov bot commented Feb 12, 2022 •

edited

Loading

PointKernel commented Feb 18, 2022 •

edited

Loading

PointKernel commented Mar 29, 2022 •

edited

Loading

jrhemstad commented Mar 29, 2022

PointKernel commented Mar 29, 2022

vyasr left a comment

jrhemstad commented Mar 31, 2022

jrhemstad commented Mar 31, 2022

PointKernel commented Apr 13, 2022 •

edited

Loading

PointKernel commented May 2, 2022

Add row bitmask as a detail::hash_join member #10248

Add row bitmask as a detail::hash_join member #10248

Conversation

PointKernel commented Feb 8, 2022 • edited Loading

codecov bot commented Feb 12, 2022 • edited Loading

Codecov Report

PointKernel commented Feb 18, 2022 • edited Loading

PointKernel commented Mar 29, 2022 • edited Loading

jrhemstad commented Mar 29, 2022

PointKernel commented Mar 29, 2022

vyasr left a comment

Choose a reason for hiding this comment

jrhemstad commented Mar 31, 2022

jrhemstad commented Mar 31, 2022

PointKernel commented Apr 13, 2022 • edited Loading

PointKernel commented May 2, 2022

Add row bitmask as a `detail::hash_join` member #10248

Add row bitmask as a `detail::hash_join` member #10248

PointKernel commented Feb 8, 2022 •

edited

Loading

codecov bot commented Feb 12, 2022 •

edited

Loading

PointKernel commented Feb 18, 2022 •

edited

Loading

PointKernel commented Mar 29, 2022 •

edited

Loading

PointKernel commented Apr 13, 2022 •

edited

Loading