Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List element Equality comparator #10289

Merged
merged 129 commits into from
Apr 13, 2022
Merged
Show file tree
Hide file tree
Changes from 94 commits
Commits
Show all changes
129 commits
Select commit Hold shift + click to select a range
933c974
First commit
devavret Aug 26, 2021
a1636e5
testing and profiling deep single hierarchy struct
devavret Aug 27, 2021
d59f54c
Merge branch 'branch-22.02' into struct-row-comp
devavret Jan 12, 2022
765dd8d
Merge branch 'branch-22.02' into struct-row-comp
devavret Jan 12, 2022
3d21daf
Make the sandboxed test compile again
devavret Jan 14, 2022
9f32e6b
Update my row_comparator with nullate
devavret Jan 15, 2022
53d3c90
Merge branch 'branch-22.02' into struct-row-comp
devavret Jan 21, 2022
022e2a4
Basic verticalization utility and experimental namespace
devavret Jan 24, 2022
7fef643
clean up most of row operators that I didn't change.
devavret Jan 26, 2022
930d8de
Sliced column test
devavret Jan 27, 2022
0ecc4f8
column order and null precendence support
devavret Jan 28, 2022
ff36d2d
Manually managed stack
devavret Jan 28, 2022
cd0f938
New depth based method to avoid superimpose nulls
devavret Feb 2, 2022
7b8e060
Put sort2 impl in separate TU
devavret Feb 2, 2022
25eb237
Merge branch 'branch-22.04' into struct-row-comp
devavret Feb 2, 2022
c8e527e
Basic working list == comp
devavret Feb 4, 2022
eb87ed7
Merge branch 'branch-22.04' into list-row-eq
devavret Feb 4, 2022
cc1584d
deeper list test
devavret Feb 4, 2022
925481a
benchmark list ==
devavret Feb 7, 2022
b2b41c7
small cleanups
devavret Feb 7, 2022
d2937cf
Merge branch 'branch-22.04' into struct-row-comp
devavret Feb 10, 2022
d55c9c7
Move verticalization code to row_comparator.cpp
devavret Feb 15, 2022
b7cdfe0
Merge branch 'struct-row-comp' into list-row-eq
devavret Feb 15, 2022
8309151
Use regular type dispatcher with new id type map
devavret Feb 15, 2022
8717b9c
Early return from unequal leaf elements
devavret Feb 15, 2022
21df6cf
Combined struct and list equality operator
devavret Feb 17, 2022
fa35461
Add null testing to list level also
devavret Feb 18, 2022
3bd749e
Owning row lex operator
devavret Feb 22, 2022
613d664
merge fixes
devavret Feb 23, 2022
2ef3ac7
Move struct logic out of main row loop and into element_relational_co…
devavret Feb 24, 2022
5577431
pushing even more logic into element_relational_comparator
devavret Feb 24, 2022
f037bc0
More optimizations.
devavret Feb 24, 2022
8c54a85
review changes
devavret Feb 24, 2022
9d24a87
Checks to ensure tables can be compared
devavret Feb 24, 2022
4e5fe21
Merge branch 'struct-row-comp' into list-row-eq
devavret Feb 24, 2022
693dbca
Owning row eq operator
devavret Feb 24, 2022
294b0cf
Another attempt at new API
devavret Mar 2, 2022
a4c799a
Remove stack based struct comparator + cleanups
devavret Mar 7, 2022
ecb2eb0
thrust::pair -> cuda::std::pair
devavret Mar 7, 2022
34a6564
optional device spans
devavret Mar 7, 2022
fa4abb4
Prevent device comparator construction from any table_device_view
devavret Mar 7, 2022
b213210
Nullate default and fix for non nested depth
devavret Mar 7, 2022
6f9bedd
Fix an unsurfaced bug about depth passing
devavret Mar 7, 2022
be69ffa
Switch over sort impl to new comparator
devavret Mar 8, 2022
76d535a
Copyright changes to satiate ci
devavret Mar 8, 2022
78d10fc
Migrate struct sort benchmark to nvbench
devavret Mar 8, 2022
15920ee
Avoid optional::value in favor of *
devavret Mar 8, 2022
d01fc30
throw when trying to sort List
devavret Mar 8, 2022
ac2eb0d
Leftover change for struct sort nvbench
devavret Mar 8, 2022
076c4c1
struct without null pushdown test
devavret Mar 9, 2022
e8a9202
Remove temporary sort2_test
devavret Mar 9, 2022
a4b1167
Remove temporary sort2 files
devavret Mar 9, 2022
62f6914
leftover sort2 in cmake
devavret Mar 9, 2022
8f628ae
cleanup benchmark headers
devavret Mar 9, 2022
dc7d125
Docs
devavret Mar 9, 2022
fa7d940
Merge branch 'branch-22.04' into struct-row-comp
devavret Mar 10, 2022
bdc1cb6
Merge branch 'struct-row-comp' into list-row-eq
devavret Mar 10, 2022
83ba4bf
Match API with self lex comparator
devavret Mar 10, 2022
9c0060f
Guard null check code with nullate
devavret Mar 10, 2022
57fdd1e
remove redundant size check
devavret Mar 10, 2022
76c883f
Apply suggestions from code review
devavret Mar 14, 2022
5fc82a9
Docs
devavret Mar 14, 2022
82db9d6
port benchmark to nvbench
devavret Mar 14, 2022
7871c48
privatise row_equality_comparator's ctor
devavret Mar 14, 2022
c9e5dc3
List rank test cleanup and merge with reduction test
devavret Mar 14, 2022
98b253b
rmm pool in benchmark + style fixes
devavret Mar 14, 2022
38fa66f
Merge branch 'struct-row-comp' into list-row-eq
devavret Mar 14, 2022
3255dc5
Merge branch 'branch-22.04' into struct-row-comp
devavret Mar 14, 2022
44d3735
Merge branch 'struct-row-comp' into list-row-eq
devavret Mar 14, 2022
e2d4b93
run cmake-format
devavret Mar 14, 2022
52e3a35
Review changes
devavret Mar 15, 2022
9470f06
More review changes
devavret Mar 15, 2022
7c897c3
Review changes req by @vyasr
devavret Mar 17, 2022
e0467c7
add a runtime is_relationally_comparable funtion
devavret Mar 17, 2022
fc1e993
Review changes
devavret Mar 18, 2022
096593f
Review changes
devavret Mar 18, 2022
f539647
Avoid WAR of storing a table_device_view
devavret Mar 18, 2022
01be0bc
Rename struct_linearize to decompose_structs and Improve docs
devavret Mar 18, 2022
de95530
review changes req by @ttnghia
devavret Mar 21, 2022
6c45cd4
Namespace changes and making element comparator private
devavret Mar 21, 2022
f72ce8b
Merge branch 'struct-row-comp' into list-row-eq
devavret Mar 21, 2022
c624317
put in row namespace
devavret Mar 21, 2022
0ca2d14
Review on lex applied to equality
devavret Mar 21, 2022
f309837
create method just like struct lex
devavret Mar 21, 2022
81f9ab8
loop 0 to size -> start off to end off
devavret Mar 22, 2022
9bfd08e
Update cpp/include/cudf/table/experimental/row_operators.cuh
devavret Mar 22, 2022
70e4581
Merge branch 'struct-row-comp' into list-row-eq
devavret Mar 22, 2022
dd8650d
We also need to keep the null mask buffers around
devavret Mar 22, 2022
6cab5c4
Fix slices struct issues
devavret Mar 22, 2022
14f9d25
Handle sliced list column
devavret Mar 23, 2022
bcd6962
Merge branch 'branch-22.04' into list-row-eq
devavret Mar 23, 2022
0c12c15
Move equality comparator to experimental header
devavret Mar 23, 2022
b41b3fa
Style fixes
devavret Mar 23, 2022
4919b04
Merge branch 'branch-22.06' into list-row-eq
devavret Mar 23, 2022
3dfc133
Review changes
devavret Mar 24, 2022
9031900
Review changes requested by @hyperbolic2346
devavret Mar 24, 2022
119d830
Add an equality comparable check similar to lex comparable check
devavret Mar 24, 2022
1cefa5a
Move linked column to a common header in utilities
devavret Mar 24, 2022
34aa66b
Review changes
devavret Mar 28, 2022
e7ea7f9
Change to progressive slicing logic
devavret Mar 28, 2022
e60fbd4
pull type dispatcher out of element compare loop
devavret Mar 28, 2022
42319ad
Move slicing logic to lists_column_device_view and new structs_column…
devavret Mar 28, 2022
67c035c
Move list size iterator and make it only constructible from list_colu…
devavret Mar 28, 2022
e5fe24c
push slice logic into columns_device_view
devavret Mar 29, 2022
85861a9
Add validity safe iterator
devavret Mar 29, 2022
9db7479
Move element_range_comparator to element_comparator's private
devavret Mar 29, 2022
db3b79b
style fixes
devavret Mar 29, 2022
6ca7deb
Docs for the newly added stuff
devavret Mar 29, 2022
2fc3d3d
Merge branch 'branch-22.06' into list-row-eq
devavret Mar 29, 2022
1c3a99d
review changes
devavret Mar 29, 2022
a0e581c
review changes
devavret Mar 30, 2022
046c407
Review changes
devavret Mar 30, 2022
6a282a6
linked_column_view inherit from column_view_base
devavret Mar 30, 2022
0f768ac
spell check
devavret Mar 30, 2022
92c1ff5
Change composition to private inheritance
devavret Mar 31, 2022
4c0e7fa
Replace __host__ __device__ with macro
devavret Mar 31, 2022
75104bb
Add more null frequencies to benchmark
devavret Mar 31, 2022
1e1053b
Templatize make_validity_iterator
devavret Mar 31, 2022
bcfe91b
Increase testing for null frequency
devavret Mar 31, 2022
981438d
curr_col -> temp_col
devavret Mar 31, 2022
5bbf18e
element_range_comparator -> column_comparator
devavret Mar 31, 2022
8e18d66
cleaner column_view conversion
devavret Mar 31, 2022
75eaed4
delete copy ctor and assignment operator
devavret Apr 1, 2022
be98357
iterator docs
devavret Apr 1, 2022
f4c509a
Handle empty struct in list equality
devavret Apr 8, 2022
d1386cf
Handle empty list (without offsets)
devavret Apr 8, 2022
6aef29f
Merge branch 'branch-22.06' into list-row-eq
devavret Apr 8, 2022
3cc1159
Merge branch 'branch-22.06' into list-row-eq
devavret Apr 11, 2022
8078e3c
Column_device_view review changes
devavret Apr 12, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion cpp/benchmarks/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -181,7 +181,7 @@ ConfigureBench(
REDUCTION_BENCH reduction/anyall.cpp reduction/dictionary.cpp reduction/minmax.cpp
reduction/reduce.cpp reduction/scan.cpp
)
ConfigureNVBench(REDUCTION_NVBENCH reduction/segment_reduce.cu)
ConfigureNVBench(REDUCTION_NVBENCH reduction/segment_reduce.cu reduction/rank.cpp)

# ##################################################################################################
# * reduction benchmark ---------------------------------------------------------------------------
Expand Down
64 changes: 64 additions & 0 deletions cpp/benchmarks/reduction/rank.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
/*
* Copyright (c) 2022, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

#include <benchmarks/common/generate_input.hpp>
#include <benchmarks/fixture/rmm_pool_raii.hpp>

#include <cudf/detail/scan.hpp>
#include <cudf/filling.hpp>
#include <cudf/lists/list_view.cuh>

#include <nvbench/nvbench.cuh>

template <typename type>
static void nvbench_reduction_scan(nvbench::state& state, nvbench::type_list<type>)
{
cudf::rmm_pool_raii pool_raii;

auto const dtype = cudf::type_to_id<type>();

bool const include_nulls = state.get_int64("include_nulls");
size_t const size = state.get_int64("data_size");

data_profile table_data_profile;
table_data_profile.set_distribution_params(dtype, distribution_id::UNIFORM, 0, 5);
table_data_profile.set_null_frequency((include_nulls) ? 0.1 : 0.0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might balloon benchmark times too much, so feel free to decline, but this seems like a benchmark where it might be quite interesting to see how performance changes with different amounts of nulls (i.e. parametrizing this).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes the axis from include_nulls to null_frequency. Here's the results:
image
image
There's a big jump from no nulls to 10% nulls because for 0 nulls, a portion of the code is inactive. Adding nulls seems to help flat column because it doesn't have to load and check the actual value.
For list, I think we don't see the early return benefits until after 70% nulls.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the 0 nulls case is qualitatively different. For the rest I'm guessing that it's some balance between less work and more divergence? The cost of a thread idling for a list is worse than for scalars and should get worse the larger the lists since in theory the amount of idle time is potentially only bounded by the largest list. I don't think there's anything actionable here, but good to see


auto const table = create_random_table({dtype}, table_size_bytes{size / 2}, table_data_profile);

auto const new_tbl = cudf::repeat(table->view(), 2);
cudf::column_view input(new_tbl->view().column(0));

state.exec(nvbench::exec_tag::sync, [&](nvbench::launch& launch) {
rmm::cuda_stream_view stream_view{launch.get_stream()};
auto result = cudf::detail::inclusive_dense_rank_scan(
input, stream_view, rmm::mr::get_current_device_resource());
});
}

using data_type = nvbench::type_list<int32_t, cudf::list_view>;

NVBENCH_BENCH_TYPES(nvbench_reduction_scan, NVBENCH_TYPE_AXES(data_type))
.set_name("rank_scan")
.add_int64_axis("include_nulls", {0, 1})
.add_int64_axis("data_size",
{
10000, // 10k
100000, // 100k
1000000, // 1M
10000000, // 10M
100000000, // 100M
});
20 changes: 18 additions & 2 deletions cpp/include/cudf/lists/lists_column_device_view.cuh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2020, NVIDIA CORPORATION.
* Copyright (c) 2020-2022, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -37,10 +37,16 @@ class lists_column_device_view {
lists_column_device_view& operator=(lists_column_device_view const&) = default;
lists_column_device_view& operator=(lists_column_device_view&&) = default;

lists_column_device_view(column_device_view const& underlying_) : underlying(underlying_)
CUDF_HOST_DEVICE lists_column_device_view(column_device_view const& underlying_)
: underlying(underlying_)
{
#ifdef __CUDACC__
cudf_assert(underlying.type().id() == type_id::LIST and
"lists_column_device_view only supports lists");
devavret marked this conversation as resolved.
Show resolved Hide resolved
#else
CUDF_EXPECTS(underlying_.type().id() == type_id::LIST,
"lists_column_device_view only supports lists");
#endif
}

/**
Expand All @@ -56,6 +62,16 @@ class lists_column_device_view {
return underlying.child(lists_column_view::offsets_column_index);
}

/**
* @brief Fetches the list offset value at a given row index while taking column offset into
* account.
*/
[[nodiscard]] __device__ inline size_type offset_at(size_type idx) const
{
return underlying.child(lists_column_view::offsets_column_index)
.element<size_type>(offset() + idx);
}

/**
* @brief Fetches the child column of the underlying list column.
*/
Expand Down
259 changes: 256 additions & 3 deletions cpp/include/cudf/table/experimental/row_operators.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
#include <cudf/column/column_device_view.cuh>
#include <cudf/detail/utilities/assert.cuh>
#include <cudf/detail/utilities/hash_functions.cuh>
#include <cudf/lists/lists_column_device_view.cuh>
#include <cudf/sorting.hpp>
#include <cudf/table/row_operators.cuh>
#include <cudf/table/table_device_view.cuh>
Expand Down Expand Up @@ -176,9 +177,7 @@ class device_row_comparator {
__device__ cuda::std::pair<weak_ordering, int> operator()(size_type const lhs_element_index,
size_type const rhs_element_index)
{
// TODO: make this CUDF_UNREACHABLE
cudf_assert(false && "Attempted to compare elements of uncomparable types.");
return cuda::std::make_pair(weak_ordering::LESS, std::numeric_limits<int>::max());
CUDF_UNREACHABLE("Attempted to compare elements of uncomparable types.");
}

template <typename Element, CUDF_ENABLE_IF(std::is_same_v<Element, cudf::struct_view>)>
Expand Down Expand Up @@ -420,6 +419,260 @@ class self_comparator {
};

} // namespace lexicographic

namespace equality_hashing {
devavret marked this conversation as resolved.
Show resolved Hide resolved

template <typename Nullate>
class device_row_comparator {
friend class self_eq_comparator;

/**
* @brief Construct a function object for performing equality comparison between the rows of two
* tables.
*
* @param has_nulls Indicates if either input table contains columns with nulls.
* @param lhs The first table
* @param rhs The second table (may be the same table as `lhs`)
* @param nulls_are_equal Indicates if two null elements are treated as equivalent
*/
device_row_comparator(Nullate has_nulls,
table_device_view lhs,
table_device_view rhs,
null_equality nulls_are_equal = null_equality::EQUAL)
devavret marked this conversation as resolved.
Show resolved Hide resolved
: lhs{lhs}, rhs{rhs}, nulls{has_nulls}, nulls_are_equal{nulls_are_equal}
{
}

/**
* @brief Performs an equality comparison between two elements in two columns.
*
* @tparam Nullate A cudf::nullate type describing how to check for nulls.
*/
class element_comparator {
public:
/**
* @brief Construct type-dispatched function object for comparing equality
* between two elements.
*
* @note `lhs` and `rhs` may be the same.
*
* @param has_nulls Indicates if either input column contains nulls.
* @param lhs The column containing the first element
* @param rhs The column containing the second element (may be the same as lhs)
* @param nulls_are_equal Indicates if two null elements are treated as equivalent
*/
__device__ element_comparator(Nullate has_nulls,
column_device_view lhs,
column_device_view rhs,
null_equality nulls_are_equal = null_equality::EQUAL)
devavret marked this conversation as resolved.
Show resolved Hide resolved
: lhs{lhs}, rhs{rhs}, nulls{has_nulls}, nulls_are_equal{nulls_are_equal}
{
}

/**
* @brief Compares the specified elements for equality.
*
* @param lhs_element_index The index of the first element
* @param rhs_element_index The index of the second element
* @return True if lhs and rhs are equal or if both lhs and rhs are null and nulls are
* configured to be considered equal (`nulls_are_equal` == `null_equality::EQUAL`)
*/
template <typename Element, CUDF_ENABLE_IF(cudf::is_equality_comparable<Element, Element>())>
__device__ bool operator()(size_type const lhs_element_index,
size_type const rhs_element_index) const noexcept
{
if (nulls) {
bool const lhs_is_null{lhs.is_null(lhs_element_index)};
bool const rhs_is_null{rhs.is_null(rhs_element_index)};
if (lhs_is_null and rhs_is_null) {
return nulls_are_equal == null_equality::EQUAL;
} else if (lhs_is_null != rhs_is_null) {
return false;
}
jrhemstad marked this conversation as resolved.
Show resolved Hide resolved
}

return equality_compare(lhs.element<Element>(lhs_element_index),
rhs.element<Element>(rhs_element_index));
}

template <typename Element,
CUDF_ENABLE_IF(not cudf::is_equality_comparable<Element, Element>() and
not cudf::is_nested<Element>())>
__device__ bool operator()(size_type const lhs_element_index, size_type const rhs_element_index)
{
CUDF_UNREACHABLE("Attempted to compare elements of uncomparable types.");
}

template <typename Element, CUDF_ENABLE_IF(cudf::is_nested<Element>())>
__device__ bool operator()(size_type const lhs_element_index,
size_type const rhs_element_index) const noexcept
{
column_device_view lcol = lhs;
column_device_view rcol = rhs;
devavret marked this conversation as resolved.
Show resolved Hide resolved
int l_start_off = lhs_element_index;
int r_start_off = rhs_element_index;
int l_end_off = lhs_element_index + 1;
int r_end_off = rhs_element_index + 1;
while (is_nested(lcol.type())) {
if (nulls) {
vyasr marked this conversation as resolved.
Show resolved Hide resolved
for (int i = l_start_off, j = r_start_off; i < l_end_off; ++i, ++j) {
bool const lhs_is_null{lcol.is_null(i)};
bool const rhs_is_null{rcol.is_null(j)};

if (lhs_is_null and rhs_is_null) {
if (nulls_are_equal == null_equality::UNEQUAL) { return false; }
} else if (lhs_is_null != rhs_is_null) {
return false;
}
}
}
if (lcol.type().id() == type_id::STRUCT) {
lcol = lcol.child(0);
rcol = rcol.child(0);
} else if (lcol.type().id() == type_id::LIST) {
auto l_list_col = detail::lists_column_device_view(lcol);
auto r_list_col = detail::lists_column_device_view(rcol);
for (int i = l_start_off, j = r_start_off; i < l_end_off; ++i, ++j) {
if (l_list_col.offset_at(i + 1) - l_list_col.offset_at(i) !=
r_list_col.offset_at(j + 1) - r_list_col.offset_at(j))
return false;
}
lcol = l_list_col.child();
rcol = r_list_col.child();
l_start_off = l_list_col.offset_at(l_start_off);
r_start_off = r_list_col.offset_at(r_start_off);
l_end_off = l_list_col.offset_at(l_end_off);
r_end_off = r_list_col.offset_at(r_end_off);
if (l_end_off - l_start_off != r_end_off - r_start_off) { return false; }
}
}

for (int i = l_start_off, j = r_start_off; i < l_end_off; ++i, ++j) {
bool equal = type_dispatcher<dispatch_void_if_nested>(
lcol.type(), element_comparator{nulls, lcol, rcol, nulls_are_equal}, i, j);
if (not equal) { return false; }
}
return true;
}
Copy link
Contributor

@jrhemstad jrhemstad Mar 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't meant to be 100% functional as it requires a few new pieces of machinery, but I wanted to convey what I think should be possible here.

At present, my observation is that this logic is effectively zooming in on successive slices of child columns by carrying state through the l_start_off/l_end_off and r_start_off/r_end_off.

My intuition is to make this logic more natural by using the slicing mechanism inherent in column_device_views and instead carry state by updating column_device_view objects that represent the successive slices as you iterate through the nested structure

Assumptions:

  • column_device_view::sliced_child would just copy the parents offset to the child (doesn't modify any data/offsets)
  • lists_column_device_view::sliced_child would apply the parent offset plus offset_at to the returned slice childs offset.
Suggested change
{
column_device_view lcol = lhs;
column_device_view rcol = rhs;
int l_start_off = lhs_element_index;
int r_start_off = rhs_element_index;
int l_end_off = lhs_element_index + 1;
int r_end_off = rhs_element_index + 1;
while (is_nested(lcol.type())) {
if (nulls) {
for (int i = l_start_off, j = r_start_off; i < l_end_off; ++i, ++j) {
bool const lhs_is_null{lcol.is_null(i)};
bool const rhs_is_null{rcol.is_null(j)};
if (lhs_is_null and rhs_is_null) {
if (nulls_are_equal == null_equality::UNEQUAL) { return false; }
} else if (lhs_is_null != rhs_is_null) {
return false;
}
}
}
if (lcol.type().id() == type_id::STRUCT) {
lcol = lcol.child(0);
rcol = rcol.child(0);
} else if (lcol.type().id() == type_id::LIST) {
auto l_list_col = detail::lists_column_device_view(lcol);
auto r_list_col = detail::lists_column_device_view(rcol);
for (int i = l_start_off, j = r_start_off; i < l_end_off; ++i, ++j) {
if (l_list_col.offset_at(i + 1) - l_list_col.offset_at(i) !=
r_list_col.offset_at(j + 1) - r_list_col.offset_at(j))
return false;
}
lcol = l_list_col.child();
rcol = r_list_col.child();
l_start_off = l_list_col.offset_at(l_start_off);
r_start_off = r_list_col.offset_at(r_start_off);
l_end_off = l_list_col.offset_at(l_end_off);
r_end_off = r_list_col.offset_at(r_end_off);
if (l_end_off - l_start_off != r_end_off - r_start_off) { return false; }
}
}
for (int i = l_start_off, j = r_start_off; i < l_end_off; ++i, ++j) {
bool equal = type_dispatcher<dispatch_void_if_nested>(
lcol.type(), element_comparator{nulls, lcol, rcol, nulls_are_equal}, i, j);
if (not equal) { return false; }
}
return true;
}
{
column_device_view left_slice = lhs;
column_device_view right_slice = rhs;
while (is_nested(lcol.type())) {
if (nulls) {
auto similar_nulls = thrust::equal(thrust::seq,
left_slice.null_begin(), left_slice.null_end(),
right_slice.null_begin(),
[null_are_equal](bool lhs_is_null, bool rhs_is_null){
return null_compare(lhs_is_null, rhs_is_null, nulls_are_equal);
});
if(not similar_nulls){ return false; }
if (lcol.type().id() == type_id::STRUCT) {
left_slice = left_slice.sliced_child(0);
right_slice = left_slice.sliced_child(0);
} else if (lcol.type().id() == type_id::LIST) {
auto left_list = detail::lists_column_device_view(left_slice);
auto right_list = detail::lists_column_device_view(right_slice);
if(not thrust::equal(thrust::seq,
left_list.begin_list_sizes(),
left_list.end_list_sizes(),
right_list.begin_list_sizes())){
// If the size of each sub-list isn't the same, then parent lists cannot be equal
return false;
}
left_slice = left_list.sliced_child();
right_slice = right_list.sliced_child();
// If the size of the parent lists is different, they cannot be equal
if(left_slice.size() != right_slice.size()){
return false;
}
}
}
auto compare = [&](auto i, auto j){
auto comparator = element_comparator{nulls, left_slice, right_slice, nulls_are_equal};
return type_dispatcher<dispatch_void_if_nested>{left_slice.type(), comparator, i, j};
}
return thrust::equal(thrust::seq,
thrust::counting_iterator{0}, thrust::counting_iterator{left_slice.size()},
thrust::counting_iterator{0},
compare)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this line column_device_view left_slice = lhs; it should be column_device_view left_slice = slice(lhs, {lhs_idx, lhs_idx + 1}); Because otherwise we're comparing the entire column in a single thread.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can take the nulls_are_equal check out of the tight loop and make it a separate check.

if (nulls_are_equal == null_equality::UNEQUAL) {
  if (thrust::any_of(left_slice.null_begin(), left_slice.null_end() or
      thrust::any_of(right_slice.null_begin(), right_slice.null_end())
  { return false; }
} else {
  // xor lhs_is_null and rhs_is_null across the slice
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also have a small fear that slicing and creating new column_device_view objects might degrade performance. I think it doesn't right now because the compiler knows that it doesn't have to create an object and can use one directly from the table_device_view memory

Copy link
Contributor

@jrhemstad jrhemstad Mar 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also have a small fear that slicing and creating new column_device_view objects might degrade performance.

I share that concern. The compiler should optimize that all away, but this is some complex code. No way to tell other than to try :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I realized we can lift the type_dispatch up a level. There's no need to type dispatch for every element in the slice. We can just type dispatch once and compare all the elements in the slice.

Copy link
Contributor Author

@devavret devavret Mar 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good news. The changes had only 1% perf impact. Now I just need to figure out where the individual pieces should be placed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that before or after changing the type dispatch to only be one per slice instead of per-element? I would hope that improves perf.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7% improvement after pulling the type dispatcher out.


private:
column_device_view const lhs;
column_device_view const rhs;
Nullate const nulls;
null_equality const nulls_are_equal;
};

public:
/**
* @brief Checks whether the row at `lhs_index` in the `lhs` table is equal to the row at
* `rhs_index` in the `rhs` table.
*
* @param lhs_index The index of row in the `lhs` table to examine
* @param rhs_index The index of the row in the `rhs` table to examine
* @return `true` if row from the `lhs` table is equal to the row in the `rhs` table
*/
__device__ bool operator()(size_type const lhs_index, size_type const rhs_index) const noexcept
{
auto equal_elements = [=](column_device_view l, column_device_view r) {
return cudf::type_dispatcher(
l.type(), element_comparator{nulls, l, r, nulls_are_equal}, lhs_index, rhs_index);
};

return thrust::equal(thrust::seq, lhs.begin(), lhs.end(), rhs.begin(), equal_elements);
}
devavret marked this conversation as resolved.
Show resolved Hide resolved

private:
table_device_view const lhs;
table_device_view const rhs;
Nullate const nulls;
null_equality const nulls_are_equal;
};

struct preprocessed_table {
/**
* @brief Preprocess table for use with row equality comparison or row hashing
*
* Sets up the table for use with row equality comparison or row hashing. The resulting
* preprocessed table can be passed to the constructor of `equality_hashing::self_comparator` to
* avoid preprocessing again.
*
* @param table The table to preprocess
* @param stream The cuda stream to use while preprocessing.
*/
static std::shared_ptr<preprocessed_table> create(table_view const& table,
rmm::cuda_stream_view stream);

private:
friend class self_eq_comparator;

using table_device_view_owner =
std::invoke_result_t<decltype(table_device_view::create), table_view, rmm::cuda_stream_view>;

preprocessed_table(table_device_view_owner&& table,
std::vector<rmm::device_buffer>&& null_buffers)
: _t(std::move(table)), _null_buffers(std::move(null_buffers))
{
}

/**
* @brief Implicit conversion operator to a `table_device_view` of the preprocessed table.
*
* @return table_device_view
*/
operator table_device_view() { return *_t; }

private:
devavret marked this conversation as resolved.
Show resolved Hide resolved
table_device_view_owner _t;
std::vector<rmm::device_buffer> _null_buffers;
};

class self_eq_comparator {
public:
/**
* @brief Construct an owning object for performing equality comparisons between two rows of the
* same table.
*
* @param t The table to compare
* @param stream The stream to construct this object on. Not the stream that will be used for
* comparisons using this object.
*/
self_eq_comparator(table_view const& t, rmm::cuda_stream_view stream)
: d_t(preprocessed_table::create(t, stream))
{
}

/**
* @brief Construct an owning object for performing equality comparisons between two rows of the
* same table.
*
* This constructor allows independently constructing a `preprocessed_table` and sharing it among
* multiple comparators.
*
* @param t A table preprocessed for equality comparison
*/
self_eq_comparator(std::shared_ptr<preprocessed_table> t) : d_t{std::move(t)} {}

/**
* @brief Get the comparison operator to use on the device
*
* Returns a binary callable, `F`, with signature `bool F(size_t, size_t)`.
*
* `F(i,j)` returns true if and only if row `i` compares equal to row `j`.
*
* @tparam Nullate Optional, A cudf::nullate type describing how to check for nulls.
*/
template <typename Nullate>
device_row_comparator<Nullate> device_comparator(Nullate nullate = {}) const
{
return device_row_comparator(nullate, *d_t, *d_t);
}

private:
std::shared_ptr<preprocessed_table> d_t;
};

} // namespace equality_hashing

} // namespace row
} // namespace experimental
} // namespace cudf
Loading