Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Temporarily reverse semi-anti-join implementation #11310

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
83 changes: 71 additions & 12 deletions cpp/src/join/semi_join.cu
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,14 @@
* limitations under the License.
*/

#include <join/join_common_utils.cuh>
hyperbolic2346 marked this conversation as resolved.
Show resolved Hide resolved
#include <join/join_common_utils.hpp>

#include <cudf/detail/gather.hpp>
#include <cudf/detail/iterator.cuh>
#include <cudf/detail/null_mask.hpp>
#include <cudf/detail/nvtx/ranges.hpp>
#include <cudf/detail/search.hpp>
#include <cudf/detail/structs/utilities.hpp>
#include <cudf/dictionary/detail/update_keys.hpp>
#include <cudf/join.hpp>
#include <cudf/table/table.hpp>
Expand All @@ -34,7 +36,6 @@
#include <thrust/distance.h>
#include <thrust/iterator/counting_iterator.h>
#include <thrust/sequence.h>
#include <thrust/transform.h>

namespace cudf {
namespace detail {
Expand All @@ -60,26 +61,84 @@ std::unique_ptr<rmm::device_uvector<cudf::size_type>> left_semi_anti_join(
return result;
}

// Materialize a `flagged` boolean array to generate a gather map.
// Previously, the gather map was generated directly without this array but by calling to
// `map.contains` inside the `thrust::copy_if` kernel. However, that led to increasing register
// usage and reducing performance, as reported here: https://github.com/rapidsai/cudf/pull/10511.
auto const flagged =
cudf::detail::contains(right_keys, left_keys, compare_nulls, nan_equality::ALL_EQUAL, stream);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this issue affect other uses of cudf::detail::contains? Should we be changing that function's implementation instead of just the semi-join implementation? (I haven't formed an opinion on this question yet, need to read more code first.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right question. Changing the implementation will completely fix this, but requires new FEA from cuco, which is under way: NVIDIA/cuCollections#191

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This concerns me as well. It seems an issue with lots of duplicate keys in general and we just found this instance to be a problem so far.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the only use case of cudf::detail::contains is in lists operations.

Copy link
Contributor

@bdice bdice Jul 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the goals of #11100 was to reduce the number of unique functions implementing the same (or similar) logic. Is it possible to change cudf::detail::contains and leave semi_join.cu untouched?

Copy link
Contributor Author

@ttnghia ttnghia Jul 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sound reasonable. But I'll address that concern in another separate PR. I would like to keep cudf::detail::contains separated from semi-anti-join for 22.08 to prevent any last-minute surprising performance issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can ask Jake for a quick review since the change is basically the same as NVIDIA/cuCollections#175. The issue must affect other use cases of detail::contains but just not unveiled by the existing benchmarks yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's great. If we have #cuco/191 merged quickly then I can have a complete fix up for detail::contains without a temp fix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working on it now, should be ready very quickly.

auto const left_num_rows = left_keys.num_rows();
auto const right_num_rows = right_keys.num_rows();

// flatten structs for the right and left and use that for the hash table
auto const right_flattened_tables = structs::detail::flatten_nested_columns(
right_keys, {}, {}, structs::detail::column_nullability::FORCE);
auto const left_flattened_tables = structs::detail::flatten_nested_columns(
left_keys, {}, {}, structs::detail::column_nullability::FORCE);
auto const right_flattened_keys = right_flattened_tables.flattened_columns();
auto const left_flattened_keys = left_flattened_tables.flattened_columns();

// Create hash table.
semi_map_type hash_table{compute_hash_table_size(right_num_rows),
cuco::sentinel::empty_key{std::numeric_limits<hash_value_type>::max()},
cuco::sentinel::empty_value{cudf::detail::JoinNoneValue},
hash_table_allocator_type{default_allocator<char>{}, stream},
stream.value()};

// Create hash table containing all keys found in right table
auto const right_rows_d = table_device_view::create(right_flattened_keys, stream);
auto const right_nulls = cudf::nullate::DYNAMIC{cudf::has_nulls(right_flattened_keys)};
row_hash const hash_build{right_nulls, *right_rows_d};
row_equality equality_build{right_nulls, *right_rows_d, *right_rows_d, compare_nulls};

auto iter = cudf::detail::make_counting_transform_iterator(
0, [] __device__(auto const i) { return cuco::make_pair(static_cast<hash_value_type>(i), 0); });

// skip rows that are null here.
if ((compare_nulls == null_equality::EQUAL) or (not nullable(right_keys))) {
hash_table.insert(iter, iter + right_num_rows, hash_build, equality_build, stream.value());
} else {
thrust::counting_iterator<size_type> stencil(0);
auto const [row_bitmask, _] = cudf::detail::bitmask_and(right_flattened_keys, stream);
row_is_valid pred{static_cast<bitmask_type const*>(row_bitmask.data())};

// insert valid rows
hash_table.insert_if(
iter, iter + right_num_rows, stencil, pred, hash_build, equality_build, stream.value());
}

// Now we have a hash table, we need to iterate over the rows of the left table
// and check to see if they are contained in the hash table
auto const left_rows_d = table_device_view::create(left_flattened_keys, stream);
auto const left_nulls = cudf::nullate::DYNAMIC{cudf::has_nulls(left_flattened_keys)};
row_hash hash_probe{left_nulls, *left_rows_d};
// Note: This equality comparator violates symmetry of equality and is
// therefore relying on the implementation detail of the order in which its
// operator is invoked. If cuco makes no promises about the order of
// invocation this seems a bit unsafe.
row_equality equality_probe{left_nulls, *right_rows_d, *left_rows_d, compare_nulls};

// For semi join we want contains to be true, for anti join we want contains to be false
bool const join_type_boolean = (kind == join_kind::LEFT_SEMI_JOIN);

auto const left_num_rows = left_keys.num_rows();
auto gather_map =
std::make_unique<rmm::device_uvector<cudf::size_type>>(left_num_rows, stream, mr);

rmm::device_uvector<bool> flagged(left_num_rows, stream, mr);
auto flagged_d = flagged.data();

auto hash_table_view = hash_table.get_device_view();
thrust::for_each(
rmm::exec_policy(stream),
thrust::counting_iterator<size_type>(0),
thrust::counting_iterator<size_type>(left_num_rows),
[flagged_d, hash_table_view, join_type_boolean, hash_probe, equality_probe] __device__(
const size_type idx) {
flagged_d[idx] =
hash_table_view.contains(idx, hash_probe, equality_probe) == join_type_boolean;
});

// gather_map_end will be the end of valid data in gather_map
auto gather_map_end =
thrust::copy_if(rmm::exec_policy(stream),
thrust::counting_iterator<size_type>(0),
thrust::counting_iterator<size_type>(left_num_rows),
gather_map->begin(),
[kind, d_flagged = flagged.begin()] __device__(size_type const idx) {
return *(d_flagged + idx) == (kind == join_kind::LEFT_SEMI_JOIN);
});
[flagged_d] __device__(size_type const idx) { return flagged_d[idx]; });

gather_map->resize(thrust::distance(gather_map->begin(), gather_map_end), stream);
return gather_map;
Expand Down