-
Notifications
You must be signed in to change notification settings - Fork 933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix performance issue and add a new code path to cudf::detail::contains
#11330
Fix performance issue and add a new code path to cudf::detail::contains
#11330
Conversation
This comment was marked as off-topic.
This comment was marked as off-topic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack looks like I refreshed during an update and my comments vanished. Will start again.
cpp/src/search/contains_table.cu
Outdated
template <typename Hasher> | ||
struct strong_index_hasher_adapter { | ||
strong_index_hasher_adapter(Hasher const& hasher) : _hasher{hasher} {} | ||
|
||
template <typename T> | ||
__device__ inline auto operator()(T const idx) const noexcept | ||
{ | ||
return _hasher(static_cast<size_type>(idx)); | ||
} | ||
|
||
private: | ||
Hasher _hasher; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tl;dr you have two options that allow you to switch to a struct with public members:
- Define a constructor
- Explicitly specify the dependent type when you construct the adapter, e.g.
strong_index_hasher_adapter<HasherT> h{...}
.
Removing the constructor requires C++20 support for improved CTAD. CTAD in C++17 does not include the aggregate initializer that exists by default for aggregate types such as this one, so doing strong_index_hasher_adapter h{HasherT{}}
will not infer the dependent type for the strong_index_hasher_adapter
instance. This behavior is improved in C++20 (see the C++20 note in the implicitly-generated deduction guides section of the CTAD documentation). Here's an example that shows that supplying the template parameters explicitly always works, but deduction only works if you set the compiler flag to -std=c++20
.
cpp/src/search/contains_table.cu
Outdated
template <typename Hasher> | ||
struct strong_index_hasher_adapter { | ||
strong_index_hasher_adapter(Hasher const& hasher) : _hasher{hasher} {} | ||
|
||
template <typename T> | ||
__device__ inline auto operator()(T const idx) const noexcept | ||
{ | ||
return _hasher(static_cast<size_type>(idx)); | ||
} | ||
|
||
private: | ||
Hasher _hasher; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also yes, the hasher
should be made const
if it's public.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some very minor suggestions, but this looks good to me. Thanks for putting together a solution so quickly @ttnghia and @PointKernel! Fantastic work here.
Overall I see this PR as having very minor effect on the time for the NDS benchmark where it is 4 seconds slower (or less than 1% impact). I am comparing this patch (applied on top of the original refactor PR) abellina@6919828 vs a baseline based on commit: abellina@9034b1b, which was before the original refactor work. One of the queries, query80 was statistically significant with a difference in the means of ~700ms worse. I have taken an nsys trace of the baseline and the patched up runs of this query and it's not clear from this trace what the differences are. It is something that we should look at more closely, but we feel is minor enough. |
template <typename T, | ||
typename U, | ||
CUDF_ENABLE_IF(is_strong_index_type<T>() && is_strong_index_type<U>())> | ||
__device__ constexpr auto operator()(T const lhs_index, U const rhs_index) const noexcept |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ttnghia Can you explain what types this functor expects to get? lhs_index_type, rhs_index_type
and rhs_index_type, lhs_index_type
or also things like lhs_index_type, lhs_index_type
, rhs_index_type, rhs_index_type
, and size_type, size_type
? I want to simplify this but I don't fully understand the requirements you're aiming for here because the templates are fairly broad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(size_type, size_type)
will never be instantiated, because only strong index types are used in the code. The template function will not be enabled for this case.
Only (lhs_index_type, lhs_index_type)
and (lhs_index_type, rhs_index_type)
will be instantiated in this file, at this time and potentially (rhs_index_type, lhs_index_type)
may be instantiated anytime in the future:
(lhs_index_type, lhs_index_type)
will be used for self-comparing the haystack table, when inserting row indices of that table into the map.(lhs_index_type, rhs_index_type)
will be used for checking contains usingcuco::static_map::contains
. The internal implementation ofcuco::static_map::contains
at this moment only calls(lhs_index_type, rhs_index_type)
, but it doesn't guarantee to not call(rhs_index_type, lhs_index_type)
in the future. Nevertheless, this function will flip the order to make sure the right indices will be passed into the underlying comparator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use three explicit operator()
s, one that addresses the self-comparison lhs_index_type, lhs_index_type
case, and two that address the left/right (right/left) pairings? This template logic is quite complicated. I propose:
/**
* @brief An adapter functor to support strong index type for table row comparator that must be
* operating on `cudf::size_type`.
*/
template <typename Comparator>
struct strong_index_comparator_adapter {
strong_index_comparator_adapter(Comparator const& comparator) : _comparator{comparator} {}
__device__ constexpr auto operator()(lhs_index_type const left, lhs_index_type const right) const noexcept
{
return _comparator(static_cast<size_type>(left), static_cast<size_type>(right);
}
__device__ constexpr auto operator()(lhs_index_type const left, rhs_index_type const right) const noexcept
{
return _comparator(static_cast<size_type>(left), static_cast<size_type>(right);
}
__device__ constexpr auto operator()(rhs_index_type const right, lhs_index_type const left) const noexcept
{
// Reverse the order to left, right.
return this->operator()(left, right);
}
private:
Comparator const _comparator;
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, no. I have tried that. This will cause compiler warning: operator() defined but never referred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like a good explanation to me! I couldn’t quite piece that together myself but I think it sounds accurate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry that is just half correct. Here we have two separate code paths:
- A code path uses exprimental row operators (
two_table_comparator
) that doesn't need this adapter (here), and - Another code path uses the traditional
cudf::row_equality
for comparing rows from two tables (here). Thiscudf::row_equality
doesn't support strong index types thus it must be wrapped in this adapter as in the link.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why there are warnings is that: this adapter is a template struct, which means the compiler creates a separate version of when it is used. In this file, it is used 3 times thus there will be 3 different versions of this template struct will be created. For each version, only one overload of operator()
is called (out of 3), causing warnings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, that is helpful to know. Perhaps that indicates that we should split this struct into multiple structs which only have the functions used in each case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good catch @ttnghia, thanks for the correction.
@gpucibot merge |
The current implementation of
cudf::detail::contains
can process input with arbitrary nested types. However, it was reported to have severe performance issue when the input tables have many duplicate rows (#11299). In order to fix the issue, #11310 and #11325 was created.Unfortunately, #11310 is separating semi-anti-join from
cudf::detail::contains
, causing duplicate implementation. On the other hand, #11325 can address the issue #11299 but semi-anti-join using it still performs worse than the previous semi-anti-join implementation.The changes in this PR include the following:
semi_anti_join
refactor #11299 for the currentcudf::detail::contains
implementation that support nested types.cudf::detail::contains
such that:Closes #11299.