Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not add nulls to the hash table when null_equality::NOT_EQUAL is passed to left_semi_join and left_anti_join #8277

Merged
merged 7 commits into from
May 24, 2021

Conversation

nvdbaranec
Copy link
Contributor

@nvdbaranec nvdbaranec commented May 18, 2021

Fixes #7300

This is fundamentally the same issue and fix as https://github.com/rapidsai/cudf/pull/6943/files from @hyperbolic2346

When nulls are considered not equal (null_equality::NOT_EQUAL) there is no point in adding them to the hash table used for the join as they will never compare as true against anything. Adding large numbers of nulls was causing huge performance issues.

Includes a fix to doxygen comments for left_anti_join

Performance gain is tremendous.

Before:

Benchmark                                                                             Time             CPU   Iterations
-----------------------------------------------------------------------------------------------------------------------
Join<int32_t, int32_t>/left_anti_join_32bit_nulls/100000/100000/manual_time        1072 ms         1072 ms            1
Join<int32_t, int32_t>/left_anti_join_32bit_nulls/200000/400000/manual_time        4253 ms         4253 ms            1
Join<int32_t, int32_t>/left_anti_join_32bit_nulls/300000/1000000/manual_time      14016 ms        14016 ms            1
Join<int32_t, int32_t>/left_semi_join_32bit_nulls/100000/100000/manual_time         932 ms          932 ms            1
Join<int32_t, int32_t>/left_semi_join_32bit_nulls/200000/400000/manual_time        4481 ms         4481 ms            1
Join<int32_t, int32_t>/left_semi_join_32bit_nulls/300000/1000000/manual_time      14172 ms        14172 ms            1

After:

-----------------------------------------------------------------------------------------------------------------------
Benchmark                                                                             Time             CPU   Iterations
-----------------------------------------------------------------------------------------------------------------------
Join<int32_t, int32_t>/left_anti_join_32bit_nulls/100000/100000/manual_time       0.143 ms        0.162 ms         4996
Join<int32_t, int32_t>/left_anti_join_32bit_nulls/200000/400000/manual_time       0.255 ms        0.275 ms         2780
Join<int32_t, int32_t>/left_anti_join_32bit_nulls/300000/1000000/manual_time      0.514 ms        0.532 ms         1368
Join<int32_t, int32_t>/left_semi_join_32bit_nulls/100000/100000/manual_time       0.135 ms        0.155 ms         5203
Join<int32_t, int32_t>/left_semi_join_32bit_nulls/200000/400000/manual_time       0.206 ms        0.224 ms         3325
Join<int32_t, int32_t>/left_semi_join_32bit_nulls/300000/1000000/manual_time      0.368 ms        0.385 ms         1903

…assed to left_anti_join and left_semi_join. Major performance gains.

Added benchmarks.
@nvdbaranec nvdbaranec added libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels May 18, 2021
@nvdbaranec nvdbaranec requested a review from a team as a code owner May 18, 2021 22:25
@nvdbaranec nvdbaranec requested review from robertmaynard and vuule May 18, 2021 22:25
@nvdbaranec
Copy link
Contributor Author

I noticed as I was doing this that we have no tests for left_anti_join. Will add.

Copy link
Member

@harrism harrism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice speedup!

cpp/src/join/semi_join.cu Outdated Show resolved Hide resolved
cpp/src/join/semi_join.cu Show resolved Hide resolved
cpp/benchmarks/join/join_benchmark.cu Outdated Show resolved Hide resolved
[hash_table] __device__(size_type idx) mutable {
hash_table.insert(thrust::make_pair(idx, true));
});
// if compare_nulls == NOT_EQUAL, we can simply ignore any rows that are
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also adjust the hash table size based on the number of valid rows? Not sure how important this is for join operations, just a thought.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this seems to work. I have to do a count_unset_bits() on the aggregate bitmask though. This also feels mildly spooky to change as I'm trying to treat lightly in this unfamiliar module :)

auto const row_bitmask = (compare_nulls == null_equality::EQUAL)
                           ? rmm::device_buffer{}
                           : cudf::detail::bitmask_and(right_flattened_keys, stream);
size_type right_row_ignore_count = compare_nulls == null_equality::EQUAL ? 0 : 
  count_unset_bits(static_cast<const bitmask_type*>(row_bitmask.data()), 0, right_num_rows);
...
size_t const hash_table_size = compute_hash_table_size(right_num_rows - right_row_ignore_count);

@github-actions github-actions bot added the CMake CMake build issue label May 20, 2021
@robertmaynard robertmaynard self-requested a review May 20, 2021 16:50
@codecov
Copy link

codecov bot commented May 20, 2021

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.06@8406522). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##             branch-21.06    #8277   +/-   ##
===============================================
  Coverage                ?   82.86%           
===============================================
  Files                   ?      105           
  Lines                   ?    17861           
  Branches                ?        0           
===============================================
  Hits                    ?    14801           
  Misses                  ?     3060           
  Partials                ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8406522...0c73774. Read the comment docs.

@harrism
Copy link
Member

harrism commented May 24, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 7e725b5 into rapidsai:branch-21.06 May 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake CMake build issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Slow performance in left_anti_join and left_semi_join when nulls are in the data
6 participants