Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve distinct_count with cuco::static_set #13343

Merged
merged 12 commits into from
May 15, 2023
35 changes: 18 additions & 17 deletions cpp/src/stream_compaction/distinct_count.cu
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@
#include <rmm/cuda_stream_view.hpp>
#include <rmm/exec_policy.hpp>

#include <cuco/static_set.cuh>
Copy link
Contributor

@ttnghia ttnghia May 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect to see some header removed from this but don't see any?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for now, some common utilities will no longer be needed but I will leave it to the distinct refactoring with cuco::static_reduction_map to avoid back-and-forth effort.


#include <thrust/count.h>
#include <thrust/execution_policy.h>
#include <thrust/iterator/counting_iterator.h>
Expand Down Expand Up @@ -127,40 +129,39 @@ cudf::size_type distinct_count(table_view const& keys,
null_equality nulls_equal,
rmm::cuda_stream_view stream)
{
auto const num_rows = keys.num_rows();
auto const num_rows = keys.num_rows();
if (num_rows == 0) { return 0; } // early exit for empty input
auto const has_nulls = nullate::DYNAMIC{cudf::has_nested_nulls(keys)};

hash_map_type key_map{compute_hash_table_size(num_rows),
cuco::empty_key{COMPACTION_EMPTY_KEY_SENTINEL},
cuco::empty_value{COMPACTION_EMPTY_VALUE_SENTINEL},
detail::hash_table_allocator_type{default_allocator<char>{}, stream},
stream.value()};

auto const preprocessed_input =
cudf::experimental::row::hash::preprocessed_table::create(keys, stream);

auto const row_hasher = cudf::experimental::row::hash::row_hasher(preprocessed_input);
auto const hash_key = experimental::compaction_hash(row_hasher.device_hasher(has_nulls));

auto const row_comp = cudf::experimental::row::equality::self_comparator(preprocessed_input);

auto iter = cudf::detail::make_counting_transform_iterator(
0, [] __device__(size_type i) { return cuco::make_pair(i, i); });
auto const row_comp = cudf::experimental::row::equality::self_comparator(preprocessed_input);

auto const comparator_helper = [&](auto const row_equal) {
using hasher_type = decltype(hash_key);
auto key_set = cuco::experimental::static_set{
cuco::experimental::extent{compute_hash_table_size(num_rows)},
cuco::empty_key<cudf::size_type>{COMPACTION_EMPTY_KEY_SENTINEL},
row_equal,
Copy link
Contributor

@ttnghia ttnghia May 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the static_set is now constructed with a comparator? Is this a requirement? And insert_if doesn't require a comparator?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the static_set is now constructed with a comparator?

Yes, we are trying to align with std::unordered_set when possible. Similar to allocator and hasher, equality should be passed only once during construction (see reference).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see potential issue with adopting static_set having such requirement into cudf contains APIs. In such places:

  • The haystack table is firstly fed into a static_set, when we need to use self_comparator for key_equal.
  • Then, a needles table is used to check if its rows exist in the set. This time, we need to use two_table_comparator for key_equal.

Copy link
Contributor

@ttnghia ttnghia May 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to steal the show. Here is my suggestion to overcome the aforementioned issue: Combine both comparators into a thin proxy wrapper:

template<typename SelfComparator, typename TwoTableComparator>
struct comparator {
  SelfComparator self_comp;
  TwoTableComparator two_table_comp;

  bool operator()(lhs_index_type i, lhs_index_type j) const { return self_comp(static_cast<size_type>(i), static_cast<size_type>(j)); } 
  bool operator()(lhs_index_type i, rhs_index_type j) const { return two_table_comp(i, j); } 
  bool operator()(rhs_index_type i, lhs_index_type j) const { return two_table_comp(i, j); } 
};

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we make sure the desired operator will be used with the equal wrapper?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When inserting, we should insert all keys of type lhs_index_type while when checking contains we will check keys of type rhs_index_type against the lhs_index_type keys in the set.

Theoretically, my proposal should work but that depends on the detail implementation of static_set. If there is compile error with it then we may also need additional modification to static_set.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theoretically, my proposal should work but that depends on the detail implementation of static_set.

Good point. cuco has an additional indirection layer to facilitate sentinel comparison thus the wrapper may not work as it is. To be tested.

cuco::experimental::linear_probing<1, hasher_type>{hash_key},
detail::hash_table_allocator_type{default_allocator<char>{}, stream},
stream.value()};

auto const iter = thrust::counting_iterator<cudf::size_type>(0);
// when nulls are equal, insert non-null rows only to improve efficiency
if (nulls_equal == null_equality::EQUAL and has_nulls) {
thrust::counting_iterator<size_type> stencil(0);
auto const [row_bitmask, null_count] =
cudf::detail::bitmask_or(keys, stream, rmm::mr::get_current_device_resource());
row_validity pred{static_cast<bitmask_type const*>(row_bitmask.data())};

key_map.insert_if(iter, iter + num_rows, stencil, pred, hash_key, row_equal, stream.value());
return key_map.get_size() + static_cast<std::size_t>(null_count > 0);
return key_set.insert_if(iter, iter + num_rows, stencil, pred, stream.value()) +
static_cast<cudf::size_type>(null_count > 0);
}
// otherwise, insert all
key_map.insert(iter, iter + num_rows, hash_key, row_equal, stream.value());
return key_map.get_size();
return key_set.insert(iter, iter + num_rows, stream.value());
};

if (cudf::detail::has_nested_columns(keys)) {
Expand Down