Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Join APIs that return gathermaps #7454

Merged
merged 183 commits into from
Mar 30, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
183 commits
Select commit Hold shift + click to select a range
4a4b4af
Merge branch 'branch-0.17' into branch-0.18
shwina Dec 11, 2020
223f2b5
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into b…
shwina Dec 15, 2020
abd6ad2
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into b…
shwina Dec 17, 2020
18863b5
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into b…
shwina Jan 4, 2021
0fbdd31
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into b…
shwina Jan 5, 2021
dc9b943
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into b…
shwina Jan 5, 2021
d586aa7
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into b…
shwina Jan 7, 2021
996fda8
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into b…
shwina Jan 8, 2021
2808a5c
Add a compute_hash_join_indices that returns just the join indices
shwina Jan 11, 2021
ef0baee
Don't need common_columns stuff for join that returns a gathermap
shwina Jan 11, 2021
18f3074
Add hash_join_impl methods that return gathermaps
shwina Jan 11, 2021
70abf48
Add overloads to public hash_join class
shwina Jan 11, 2021
13dff67
Add top-level join APIs that return gathermaps
shwina Jan 11, 2021
3300fe1
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into g…
shwina Jan 12, 2021
7ed694c
Use device_uvector instead of device_vector in join
shwina Jan 12, 2021
636c2ea
Undo some API changes
shwina Jan 12, 2021
b79da68
Add join_result
shwina Jan 13, 2021
380aa59
Add APIs that return join_result
shwina Jan 13, 2021
3cbb2b4
Remove column_in_common
shwina Jan 13, 2021
53ae7c9
Add an inner join API that returns gathermaps
shwina Jan 14, 2021
fde172b
Add remaining APIs to return gathermaps
shwina Jan 14, 2021
4a286dd
Add gathermap join test
shwina Jan 18, 2021
c756db9
Replace -1 with INT_MIN
shwina Jan 18, 2021
6a3d23e
Make join_result columns instead of column_views
shwina Jan 20, 2021
5dfc2a0
Replace join_result with a pair of columns
shwina Jan 20, 2021
362829b
Add gathermap test for outer join
shwina Jan 20, 2021
4e4380c
Add and pass full join gathermap test
shwina Jan 20, 2021
339a13d
Begin Python-side refactor
shwina Jan 21, 2021
2b07802
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into g…
shwina Jan 25, 2021
0d5a19c
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into g…
shwina Jan 28, 2021
fdbdc12
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into g…
shwina Feb 1, 2021
5dd5d29
Merge branch 'branch-0.18' of https://github.com/rapidsai/cudf into g…
shwina Feb 5, 2021
6b20429
Merge branch 'branch-0.19' into gathermap-based-join-apis
shwina Feb 8, 2021
044eac1
Add left_semi and left_anti join APIs that return gathermaps
shwina Feb 8, 2021
555d5ec
Add Cython bindings
shwina Feb 8, 2021
56ae616
full -> outer
shwina Feb 9, 2021
dd05121
Merge branch 'branch-0.19' of https://github.com/rapidsai/cudf into g…
shwina Feb 9, 2021
d447924
Progress
shwina Feb 9, 2021
484512e
More progress on py refactor
shwina Feb 9, 2021
5227582
Remove breakpoint
shwina Feb 10, 2021
9cd870e
Fix neg index handling
shwina Feb 10, 2021
8e4f193
Use nullify gather in join
shwina Feb 10, 2021
29fe140
Handle outer joins better
shwina Feb 10, 2021
b634055
Fix index construction
shwina Feb 10, 2021
cd53d6c
Fix sorting behaviour
shwina Feb 10, 2021
75f1efd
Fix Index.join
shwina Feb 10, 2021
1f5d6ad
Progress on semi/anti joins
shwina Feb 10, 2021
de30520
Add simple join test
shwina Feb 10, 2021
66a0de5
Semi-join fix
shwina Feb 11, 2021
ca72295
Only combine key columns in outer join if they have the same name
shwina Feb 11, 2021
ee2242d
Handle when both _on and _index are provided
shwina Feb 11, 2021
e531725
Fix sorting join result
shwina Feb 11, 2021
c8b4948
Merge branch 'branch-0.19' of https://github.com/rapidsai/cudf into g…
shwina Feb 11, 2021
674095c
whitespace
shwina Feb 12, 2021
cbd9dc3
Make construct_join_output_df work with column views
shwina Feb 12, 2021
3f3c3cb
Get rid of hash_join::left_join
shwina Feb 12, 2021
01415fc
More join C++ cleanup
shwina Feb 12, 2021
6185492
Even more cleaning
shwina Feb 17, 2021
d736d1c
More join tests
shwina Feb 18, 2021
b58591d
Fix all join tests
shwina Feb 18, 2021
be560bb
Python regressions
shwina Feb 18, 2021
efb60d6
Revert
shwina Feb 18, 2021
fe6d0b8
Invalid -> Unkown
shwina Feb 18, 2021
547027c
Don't mutate lhs/rhs
shwina Feb 18, 2021
5f93d23
Fix join tests
shwina Feb 19, 2021
b7bf821
Fix semi/anti join trivial cases
shwina Feb 19, 2021
50a2fb2
When testing join results, use a helper that sorts values
shwina Feb 19, 2021
ff0ae79
Totally broken commit
shwina Feb 19, 2021
07cd052
Cleanup
shwina Feb 20, 2021
bd6bf77
Warnings
shwina Feb 20, 2021
a40063e
Cleanup
shwina Feb 22, 2021
ccef9d0
Cleanup
shwina Feb 22, 2021
210244b
Cleanup
shwina Feb 22, 2021
b57348c
Add typing for join helpers
shwina Feb 22, 2021
5c2c9b3
Typing for Join class
shwina Feb 22, 2021
558aa15
Simplify joiner API
shwina Feb 22, 2021
3184896
Example doc
shwina Feb 22, 2021
d3535dc
Refactor join APIs to return a device_uvector
shwina Feb 25, 2021
3b0a2a5
Merge tag 'branch-0.19-latest' of https://github.com/rapidsai/cudf in…
shwina Mar 1, 2021
b82181d
docs
shwina Mar 3, 2021
77d2bfd
Finish up docs?
shwina Mar 3, 2021
0bf34e8
Merge branch 'branch-0.19' of https://github.com/rapidsai/cudf into g…
shwina Mar 4, 2021
26a3fb0
Fix join tests
shwina Mar 4, 2021
8a60d62
Refactor join APIs to work with unique_ptr<rmm::device_uvector>>
shwina Mar 5, 2021
387a953
Update join Cython
shwina Mar 5, 2021
6cd6433
Need to resize the gathermap
shwina Mar 5, 2021
c67dcce
Doc
shwina Mar 5, 2021
30c22ed
Changelog
shwina Mar 5, 2021
f73199d
Add helper to convert gather_map_type->Column
shwina Mar 9, 2021
393c06a
Update python/cudf/cudf/core/frame.py
shwina Mar 9, 2021
e91f554
Cannot specify both column and index
shwina Mar 9, 2021
0185896
Vaildate how
shwina Mar 9, 2021
b232f85
Merge branch 'gathermap-based-join-apis' of github.com:shwina/cudf in…
shwina Mar 9, 2021
1eb495d
Can't use a set
shwina Mar 9, 2021
4f1f072
Avoid function local import
shwina Mar 10, 2021
4aa8fec
False -> NotImplementedError
shwina Mar 10, 2021
ae0e5f9
Update cpp/include/cudf/join.hpp
shwina Mar 10, 2021
f47cf7e
Reuse some join logic
shwina Mar 10, 2021
2a201c3
Merge branch 'gathermap-based-join-apis' of github.com:shwina/cudf in…
shwina Mar 10, 2021
230ca08
Formatting
shwina Mar 10, 2021
498a621
Update cpp/include/cudf/join.hpp
shwina Mar 11, 2021
2de26f3
Docs?
shwina Mar 11, 2021
d6f128c
Merge branch 'gathermap-based-join-apis' of github.com:shwina/cudf in…
shwina Mar 11, 2021
b7d8d8a
Use mr
shwina Mar 11, 2021
9efc761
Docs
shwina Mar 15, 2021
8779bc7
Simplify suffix handling
shwina Mar 16, 2021
4c651ac
Simplify joiner requirements
shwina Mar 17, 2021
b4f4d7c
Do less work in SemiJoin._merge_results
shwina Mar 17, 2021
d353c92
Doc
shwina Mar 17, 2021
580a346
Doc
shwina Mar 17, 2021
328dafd
Return None from semi_join
shwina Mar 17, 2021
297d20a
Init common_type
shwina Mar 17, 2021
e388dd6
Merge branch 'branch-0.19' of https://github.com/rapidsai/cudf into g…
shwina Mar 18, 2021
935648b
Move validation directly into set_by_label and use a raw dict to stor…
vyasr Mar 19, 2021
806a3ef
Remove all references to OrderedColumnDict.
vyasr Mar 19, 2021
40a7b17
Move validation to separate method and use in both set_by_label and c…
vyasr Mar 19, 2021
a1c576e
Format with black.
vyasr Mar 19, 2021
788d9d6
Expose parameter to make validation optional.
vyasr Mar 19, 2021
6a64285
Coerce constructor input to dict before calling items.
vyasr Mar 19, 2021
e7d0981
Make construction safe.
vyasr Mar 19, 2021
c39932c
Final cleanup and documentation.
vyasr Mar 19, 2021
4ff09fc
Address style issues.
vyasr Mar 19, 2021
35c63ec
Merge branch 'branch-0.19' of https://github.com/rapidsai/cudf into g…
shwina Mar 22, 2021
9433582
Merge branch 'branch-0.19' of https://github.com/rapidsai/cudf into f…
shwina Mar 22, 2021
74f2884
Merge remote-tracking branch 'origin/branch-0.19' into feature/optimi…
vyasr Mar 22, 2021
0178127
CA fix
shwina Mar 22, 2021
5c0f202
Merge branch 'feature/optimize_accessor_copy' into join-bench
shwina Mar 22, 2021
c8d2364
Don't validate on gathers
shwina Mar 22, 2021
efea63d
Prioritize numeric columns
shwina Mar 22, 2021
898a3d8
Merge branch 'feature/optimize_accessor_copy' into join-bench
shwina Mar 22, 2021
c3b6444
Lazily compute and delete column length on demand.
vyasr Mar 22, 2021
01b2cf5
Remove redundant clear cache in setitem.
vyasr Mar 22, 2021
8899258
Remove mypy annotation for column length.
vyasr Mar 22, 2021
c6cd415
Optimize casting logic
shwina Mar 22, 2021
3507785
Merge branch 'feature/optimize_accessor_copy' of github.com:vyasr/cud…
shwina Mar 22, 2021
7f8e1cd
Undo
shwina Mar 22, 2021
f2e4609
Don't validate when copying type metadata
shwina Mar 22, 2021
5d378c2
Merge branch 'feature/optimize_accessor_copy' into join-bench
shwina Mar 22, 2021
83cc407
ImportError
shwina Mar 22, 2021
72598fb
Prioritize numeric dtypes in is_numerical_dtype
shwina Mar 22, 2021
fa220b6
Add unsafe CA ctor
shwina Mar 22, 2021
6572cd3
Merge branch 'feature/optimize_accessor_copy' into join-bench
shwina Mar 22, 2021
f7dc417
Revert "Prioritize numeric dtypes in is_numerical_dtype"
shwina Mar 22, 2021
3760077
Revert "Prioritize numeric dtypes in is_numerical_dtype"
shwina Mar 22, 2021
01cdfcf
Merge branch 'feature/optimize_accessor_copy' into join-bench
shwina Mar 22, 2021
de9ca28
Change error message back so that tests pass.
vyasr Mar 23, 2021
e35d03b
Faster is_numerical_dtype
shwina Mar 23, 2021
e2fd533
Faster is_numerical_dtype
shwina Mar 23, 2021
9044d62
Merge branch 'feature/optimize_accessor_copy' into join-bench
shwina Mar 23, 2021
64ca702
Even faster is_numerical_dtype
shwina Mar 23, 2021
749edf1
Enable fast path for constructing a Buffer from a DeviceBuffer
shwina Mar 23, 2021
7526e4a
Merge branch 'feature/optimize_accessor_copy' into join-bench
shwina Mar 23, 2021
ca772b8
Small fix
shwina Mar 23, 2021
739ec57
Add validation option to insert and standardize error message.
vyasr Mar 23, 2021
498b70e
Fix style.
vyasr Mar 23, 2021
3cd012b
Merge remote-tracking branch 'vyasr/feature/optimize_accessor_copy' i…
shwina Mar 23, 2021
660afa6
Merge branch 'various-py-optimizations' into join-bench
shwina Mar 23, 2021
f8ac22f
Merge branch 'gathermap-based-join-apis' into join-bench
shwina Mar 23, 2021
c28866c
Merge branch 'branch-0.19' of https://github.com/rapidsai/cudf into v…
shwina Mar 23, 2021
01e13fa
Undo formatting change
shwina Mar 23, 2021
89a0301
Add TODO
shwina Mar 23, 2021
26f4cc8
Merge branch 'branch-0.19' of https://github.com/rapidsai/cudf into g…
shwina Mar 23, 2021
f2036eb
Merge branch 'various-py-optimizations' into join-bench
shwina Mar 23, 2021
5e73de7
init->create + doc
shwina Mar 24, 2021
e0c50b5
Merge branch 'various-py-optimizations' into gathermap-based-join-apis
shwina Mar 24, 2021
fa880c1
Merge branch 'branch-0.19' of https://github.com/rapidsai/cudf into g…
shwina Mar 24, 2021
58bdecd
Merge branch 'branch-0.19' of https://github.com/rapidsai/cudf into g…
shwina Mar 25, 2021
ed1b434
Merge branch 'join-bench' into gathermap-based-join-apis
shwina Mar 25, 2021
ca116a3
Only gather the index if necessary
shwina Mar 25, 2021
ce03918
Don't copy type metadata for the index unless we need to
shwina Mar 25, 2021
b7c6b19
Use validate=False in a few more places
shwina Mar 25, 2021
671a0e0
Import
shwina Mar 26, 2021
797087b
Review
shwina Mar 26, 2021
5ad531f
Coerce to tuple first
shwina Mar 26, 2021
f7e94fb
Replace hasattr with isinstance
shwina Mar 26, 2021
1cb9448
Handle renamed indexes
shwina Mar 26, 2021
cc89360
Fix to names setter
shwina Mar 26, 2021
4ca1238
Merge branch 'branch-0.19' of https://github.com/rapidsai/cudf into g…
shwina Mar 26, 2021
9cebf2e
Update cpp/src/join/hash_join.cu
shwina Mar 26, 2021
1584b86
Better example
shwina Mar 26, 2021
3977b79
Remove std::moves
shwina Mar 26, 2021
67919a3
Merge branch 'gathermap-based-join-apis' of github.com:shwina/cudf in…
shwina Mar 26, 2021
7bf6561
Fix formatting error
shwina Mar 26, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 2 additions & 6 deletions cpp/benchmarks/join/join_benchmark.cu
Original file line number Diff line number Diff line change
Expand Up @@ -105,12 +105,8 @@ static void BM_join(benchmark::State &state)
for (auto _ : state) {
cuda_event_timer raii(state, true, 0);

auto result = cudf::inner_join(probe_table,
build_table,
columns_to_join,
columns_to_join,
{{0, 0}},
cudf::null_equality::UNEQUAL);
auto result = cudf::inner_join(
probe_table, build_table, columns_to_join, columns_to_join, cudf::null_equality::UNEQUAL);
}
}

Expand Down
440 changes: 252 additions & 188 deletions cpp/include/cudf/join.hpp

Large diffs are not rendered by default.

5 changes: 5 additions & 0 deletions cpp/include/cudf/table/table_view.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,11 @@ class table_view_base {
*/
size_type num_rows() const noexcept { return _num_rows; }

/**
* @brief Returns true if `num_columns()` returns zero, or false otherwise
*/
size_type is_empty() const noexcept { return num_columns() == 0; }

table_view_base() = default;

~table_view_base() = default;
Expand Down
4 changes: 1 addition & 3 deletions cpp/src/copying/gather.cu
Original file line number Diff line number Diff line change
Expand Up @@ -43,9 +43,7 @@ std::unique_ptr<table> gather(table_view const& source_table,

if (neg_indices == negative_index_policy::ALLOWED) {
cudf::size_type n_rows = source_table.num_rows();
auto idx_converter = [n_rows] __device__(size_type in) {
return ((in % n_rows) + n_rows) % n_rows;
};
auto idx_converter = [n_rows] __device__(size_type in) { return in < 0 ? in + n_rows : in; };
hyperbolic2346 marked this conversation as resolved.
Show resolved Hide resolved
return gather(source_table,
thrust::make_transform_iterator(map_begin, idx_converter),
thrust::make_transform_iterator(map_end, idx_converter),
Expand Down
499 changes: 131 additions & 368 deletions cpp/src/join/hash_join.cu

Large diffs are not rendered by default.

143 changes: 52 additions & 91 deletions cpp/src/join/hash_join.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,9 @@
*/
#pragma once

#include <cudf/detail/concatenate.cuh>
#include <cudf/detail/gather.cuh>
#include <cudf/detail/gather.hpp>
#include <join/join_common_utils.hpp>
#include <join/join_kernels.cuh>

Expand All @@ -25,7 +28,7 @@
#include <cudf/table/table_view.hpp>

#include <rmm/cuda_stream_view.hpp>
#include <rmm/device_vector.hpp>
#include <rmm/device_uvector.hpp>
#include <rmm/exec_policy.hpp>

#include <thrust/sequence.h>
Expand Down Expand Up @@ -178,19 +181,29 @@ size_type estimate_join_output_size(table_device_view build_table,
*
* @param left Table of left columns to join
* @param stream CUDA stream used for device memory operations and kernel launches
* @param mr Device memory resource used to allocate the result
*
* @return Join output indices vector pair
*/
inline std::pair<rmm::device_vector<size_type>, rmm::device_vector<size_type>>
get_trivial_left_join_indices(table_view const& left, rmm::cuda_stream_view stream)
inline std::pair<std::unique_ptr<rmm::device_uvector<size_type>>,
std::unique_ptr<rmm::device_uvector<size_type>>>
get_trivial_left_join_indices(
table_view const& left,
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource())
{
rmm::device_vector<size_type> left_indices(left.num_rows());
thrust::sequence(rmm::exec_policy(stream), left_indices.begin(), left_indices.end(), 0);
rmm::device_vector<size_type> right_indices(left.num_rows());
thrust::fill(rmm::exec_policy(stream), right_indices.begin(), right_indices.end(), JoinNoneValue);
auto left_indices = std::make_unique<rmm::device_uvector<size_type>>(left.num_rows(), stream, mr);
thrust::sequence(rmm::exec_policy(stream), left_indices->begin(), left_indices->end(), 0);
auto right_indices =
std::make_unique<rmm::device_uvector<size_type>>(left.num_rows(), stream, mr);
thrust::fill(
rmm::exec_policy(stream), right_indices->begin(), right_indices->end(), JoinNoneValue);
return std::make_pair(std::move(left_indices), std::move(right_indices));
}

std::pair<std::unique_ptr<table>, std::unique_ptr<table>> get_empty_joined_table(
table_view const& probe, table_view const& build);

std::unique_ptr<cudf::table> combine_table_pair(std::unique_ptr<cudf::table>&& left,
std::unique_ptr<cudf::table>&& right);

Expand All @@ -207,106 +220,52 @@ struct hash_join::hash_join_impl {

private:
cudf::table_view _build;
cudf::table_view _build_selected;
std::vector<size_type> _build_on;
std::unique_ptr<cudf::detail::multimap_type, std::function<void(cudf::detail::multimap_type*)>>
_hash_table;

public:
/**
* @brief Constructor that internally builds the hash table based on the given `build` table and
* column indices specified by `build_on` for subsequent probe calls.
* @brief Constructor that internally builds the hash table based on the given `build` table
*
* @throw cudf::logic_error if the number of columns in `build` table is 0.
* @throw cudf::logic_error if the number of rows in `build` table exceeds MAX_JOIN_SIZE.
* @throw std::out_of_range if elements of `build_on` exceed the number of columns in the `build`
* table.
*
* @param build The build table, from which the hash table is built.
* @param build_on The column indices from `build` to join on.
* @param compare_nulls Controls whether null join-key values should match or not.
*/
hash_join_impl(cudf::table_view const& build,
std::vector<size_type> const& build_on,
null_equality compare_nulls,
rmm::cuda_stream_view stream = rmm::cuda_stream_default);

std::pair<std::unique_ptr<cudf::table>, std::unique_ptr<cudf::table>> inner_join(
cudf::table_view const& probe,
std::vector<size_type> const& probe_on,
std::vector<std::pair<cudf::size_type, cudf::size_type>> const& columns_in_common,
common_columns_output_side common_columns_output_side,
null_equality compare_nulls,
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr) const;

std::unique_ptr<cudf::table> left_join(
cudf::table_view const& probe,
std::vector<size_type> const& probe_on,
std::vector<std::pair<cudf::size_type, cudf::size_type>> const& columns_in_common,
null_equality compare_nulls,
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr) const;

std::unique_ptr<cudf::table> full_join(
cudf::table_view const& probe,
std::vector<size_type> const& probe_on,
std::vector<std::pair<cudf::size_type, cudf::size_type>> const& columns_in_common,
null_equality compare_nulls,
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr) const;
std::pair<std::unique_ptr<rmm::device_uvector<size_type>>,
std::unique_ptr<rmm::device_uvector<size_type>>>
inner_join(cudf::table_view const& probe,
null_equality compare_nulls,
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr) const;

std::pair<std::unique_ptr<rmm::device_uvector<size_type>>,
std::unique_ptr<rmm::device_uvector<size_type>>>
left_join(cudf::table_view const& probe,
null_equality compare_nulls,
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr) const;

std::pair<std::unique_ptr<rmm::device_uvector<size_type>>,
std::unique_ptr<rmm::device_uvector<size_type>>>
full_join(cudf::table_view const& probe,
null_equality compare_nulls,
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr) const;

private:
/**
* @brief Performs hash join by probing the columns provided in `probe` as per
* the joining indices given in `probe_on` and returns a (`probe`, `_build`) table pair, which
* contains the probe and build portions of the logical joined table respectively.
*
* @throw cudf::logic_error if `columns_in_common` contains a pair of indices
* (`P`, `B`) where `P` does not exist in `probe_on` or `B` does not exist in
* `_build_on`.
* @throw cudf::logic_error if `columns_in_common` contains a pair of indices
* (`P`, `B`) such that the location of `P` within `probe_on` is not equal to
* the location of `B` within `_build_on`.
* @throw cudf::logic_error if the number of elements in `probe_on` and
* `_build_on` are not equal.
* @throw cudf::logic_error if the number of columns in `probe` is 0.
* @throw cudf::logic_error if the number of rows in `probe` table exceeds MAX_JOIN_SIZE.
* @throw std::out_of_range if elements of `probe_on` exceed the number of columns in the `probe`
* table.
* @throw cudf::logic_error if types do not match between joining columns.
*
* @tparam JoinKind The type of join to be performed.
*
* @param probe The probe table.
* @param probe_on The column's indices from `probe` to join on.
* Column `i` from `probe_on` will be compared against column `i` of `_build_on`.
* @param columns_in_common is a vector of pairs of column indices into
* `probe` and `_build`, respectively, that are "in common". For "common"
* columns, only a single output column will be produced, which is gathered
* from `probe_on` columns. Else, for every column in `probe_on` and `_build_on`,
* an output column will be produced. For each of these pairs (P, B), P
* should exist in `probe_on` and B should exist in `_build_on`.
* @param common_columns_output_side @see cudf::hash_join::common_columns_output_side.
* @param compare_nulls Controls whether null join-key values should match or not.
* @param mr Device memory resource used to allocate the returned table's device memory.
* @param stream CUDA stream used for device memory operations and kernel launches.
*
* @return Table pair of (`probe`, `_build`) of joining both tables on the columns
* specified by `probe_on` and `_build_on`. The resulting table pair will be joined columns of
* (`probe(including common columns)`, `_build(excluding common columns)`) if
* `common_columns_output_side` is `PROBE`, or (`probe(excluding common columns)`,
* `_build(including common columns)`) if `common_columns_output_side` is `BUILD`.
*/
template <cudf::detail::join_kind JoinKind>
std::pair<std::unique_ptr<cudf::table>, std::unique_ptr<cudf::table>> compute_hash_join(
cudf::table_view const& probe,
std::vector<size_type> const& probe_on,
std::vector<std::pair<cudf::size_type, cudf::size_type>> const& columns_in_common,
common_columns_output_side common_columns_output_side,
null_equality compare_nulls,
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr) const;
std::pair<std::unique_ptr<rmm::device_uvector<size_type>>,
std::unique_ptr<rmm::device_uvector<size_type>>>
compute_hash_join(cudf::table_view const& probe,
null_equality compare_nulls,
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr) const;

/**
* @brief Probes the `_hash_table` built from `_build` for tuples in `probe_table`,
Expand All @@ -320,15 +279,17 @@ struct hash_join::hash_join_impl {
* @param probe_table Table of probe side columns to join.
* @param compare_nulls Controls whether null join-key values should match or not.
* @param stream CUDA stream used for device memory operations and kernel launches.
* @param mr Device memory resource used to allocate the returned vectors.
*
* @return Join output indices vector pair.
*/
template <cudf::detail::join_kind JoinKind>
std::enable_if_t<JoinKind != cudf::detail::join_kind::FULL_JOIN,
std::pair<rmm::device_vector<size_type>, rmm::device_vector<size_type>>>
std::pair<std::unique_ptr<rmm::device_uvector<size_type>>,
std::unique_ptr<rmm::device_uvector<size_type>>>
probe_join_indices(cudf::table_view const& probe,
null_equality compare_nulls,
rmm::cuda_stream_view stream) const;
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr) const;
};

} // namespace cudf
Loading