Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Remove bounds check for cudf::gather #6875

Merged
merged 45 commits into from
Dec 2, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
e286317
Initial
Oct 14, 2020
176da20
changelog
Oct 14, 2020
0e39bb1
Wrap around control
Oct 15, 2020
006a729
Revert unsigned type gather map change
Oct 15, 2020
962efe3
Document update, passes build
Oct 21, 2020
eff09da
Update docs
Oct 21, 2020
db19402
Doc updates
Oct 22, 2020
3bbc7a6
Changing to NULLIFY defaults for compatibility
Oct 22, 2020
bf78dff
Reducing failed test cases
Oct 23, 2020
2382ee9
General replacement; redcuce failed cases
Oct 30, 2020
b606411
Update cpp/include/cudf/copying.hpp
isVoid Oct 23, 2020
d709136
Retaining code logic before change
Nov 11, 2020
527372d
Minmax binding, cython gather bound check
Nov 11, 2020
39a73a4
style
Nov 11, 2020
174350a
cpp style
Nov 11, 2020
446b809
detail::gather doc update
Nov 11, 2020
4b135bb
remove stale couts
Nov 11, 2020
3216887
Apply suggestions from code review
isVoid Nov 13, 2020
dc6f5cd
style and better error msg
Nov 13, 2020
a244bf5
Address review: Refactor detail::gather API
Nov 16, 2020
1f4d4e2
Apply doc review offline; style
Nov 16, 2020
be5e953
Applying Docs Suggestions By Mark
isVoid Nov 16, 2020
a401cc2
Address review:
Nov 16, 2020
5fd81a6
Review: change out_of_bounds default policy
Nov 17, 2020
3188f67
Partially undoing EQUAL switch; one test failing
Nov 18, 2020
060719c
stale print
Nov 18, 2020
48fca22
Multiple changes:
Nov 18, 2020
b0e7c98
Undoing more EQUIVALENT uses
Nov 18, 2020
9ee83ca
Adapt to DeviceScalar
Nov 18, 2020
9827d41
Confirms using NULLIFY with david
Nov 18, 2020
e573375
Address review comments: minmax comments
Nov 19, 2020
95d5153
Move ternary op outside of function call
Nov 19, 2020
1796db7
Unsetting nullmasks on `gather` result
Nov 19, 2020
abc09b1
style
Nov 20, 2020
27bd251
Update gather.cuh to address review comments
isVoid Nov 26, 2020
2b140f4
Address review comments
Nov 29, 2020
b496734
Update CHANGELOG
Dec 2, 2020
0074c07
Update Changelog
Dec 2, 2020
3d986a5
Fixing rebase bug
Dec 2, 2020
0a9737e
Refactor bug
Dec 2, 2020
ddb49de
gather.cuh docstring bug
Dec 2, 2020
027c64a
Review change: c++ exception for minmax
Dec 2, 2020
cac5325
Review change: `ob_policy` underlying bool type
Dec 2, 2020
5882acf
Merge branch 'branch-0.17' of https://github.com/rapidsai/cudf into i…
Dec 2, 2020
78f38e3
style
Dec 2, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@
- PR #6471 Replace index type-dispatch call with indexalator in cudf::strings::substring
- PR #6485 Add File IO to cuIO benchmarks
- PR #6504 Update Java bindings version to 0.17-SNAPSHOT
- PR #6875 Remove bounds check for `cudf::gather`
- PR #6489 Add `AVRO` fuzz tests with varying function parameters
- PR #6540 Add dictionary support to `cudf::unary_operation`
- PR #6537 Refactor ORC timezone
Expand Down
26 changes: 20 additions & 6 deletions cpp/include/cudf/copying.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,18 @@ namespace cudf {
* @brief Column APIs for gather, scatter, split, slice, etc.
*/

/**
* @brief Policy to account for possible out-of-bounds indices
*
* `NULLIFY` means to nullify output values corresponding to out-of-bounds gather_map values.
* `DONT_CHECK` means do not check whether the indices are out-of-bounds, for better performance.
*/

enum class out_of_bounds_policy : bool {
NULLIFY, /// Output values corresponding to out-of-bounds indices are null
DONT_CHECK /// No bounds checking is performed, better performance
};

/**
* @brief Gathers the specified rows (including null values) of a set of columns.
*
Expand All @@ -49,22 +61,24 @@ namespace cudf {
* For dictionary columns, the keys column component is copied and not trimmed
* if the gather results in abandoned key elements.
*
* @throws cudf::logic_error if `check_bounds == true` and an index exists in
* `gather_map` outside the range `[-n, n)`, where `n` is the number of rows in
* the source table. If `check_bounds == false`, the behavior is undefined.
* @throws cudf::logic_error if gather_map contains null values.
*
* @param[in] source_table The input columns whose rows will be gathered
* @param[in] gather_map View into a non-nullable column of integral indices that maps the
* rows in the source columns to rows in the destination columns.
* @param[in] check_bounds Optionally perform bounds checking on the values
* of `gather_map` and throw an error if any of its values are out of bounds.
* @param[in] bounds_policy Policy to apply to account for possible out-of-bounds indices
* `DONT_CHECK` skips all bounds checking for gather map values. `NULLIFY` coerces rows that
* corresponds to out-of-bounds indices in the gather map to be null elements. Callers should
* use `DONT_CHECK` when they are certain that the gather_map contains only valid indices for
* better performance. If `policy` is set to `DONT_CHECK` and there are out-of-bounds indices
* in the gather map, the behavior is undefined. Defaults to `DONT_CHECK`.
* @param[in] mr Device memory resource used to allocate the returned table's device memory
* @return std::unique_ptr<table> Result of the gather
*/
std::unique_ptr<table> gather(
table_view const& source_table,
column_view const& gather_map,
bool check_bounds = false,
out_of_bounds_policy bounds_policy = out_of_bounds_policy::DONT_CHECK,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
Expand Down
8 changes: 6 additions & 2 deletions cpp/include/cudf/detail/copy_if.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -268,8 +268,12 @@ struct scatter_gather_functor {
indices.begin(),
filter);

auto output_table = cudf::detail::gather(
cudf::table_view{{input}}, indices.begin(), indices.end(), false, stream, mr);
auto output_table = cudf::detail::gather(cudf::table_view{{input}},
indices.begin(),
indices.end(),
cudf::out_of_bounds_policy::DONT_CHECK,
stream,
mr);

// There will be only one column
return std::make_unique<cudf::column>(std::move(output_table->get_column(0)));
Expand Down
32 changes: 20 additions & 12 deletions cpp/include/cudf/detail/gather.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -610,7 +610,12 @@ struct column_gatherer_impl<struct_view, MapItRoot> {
* the source columns to rows in the destination columns
* @param[in] gather_map_end End of iterator range of integer indices that map the rows in the
* source columns to rows in the destination columns
* @param[in] nullify_out_of_bounds Nullify values in `gather_map` that are out of bounds.
* @param[in] bounds_policy Policy to apply to account for possible out-of-bound indices
* `DONT_CHECK` skips all bound checking for gather map values. `NULLIFY` coerces rows that
* corresponds to out-of-bound indices in the gather map to be null elements. Callers should
* use `DONT_CHECK` when they are certain that the gather_map contains only valid indices for
* better performance. In case there are out-of-bound indices in the gather map, the behavior
* is undefined. Defaults to `DONT_CHECK`.
* @param[in] mr Device memory resource used to allocate the returned table's device memory
* @param[in] stream CUDA stream used for device memory operations and kernel launches.
* @return cudf::table Result of the gather
Expand All @@ -620,7 +625,7 @@ std::unique_ptr<table> gather(
table_view const& source_table,
MapIterator gather_map_begin,
MapIterator gather_map_end,
bool nullify_out_of_bounds = false,
out_of_bounds_policy bounds_policy = out_of_bounds_policy::DONT_CHECK,
rmm::cuda_stream_view stream = rmm::cuda_stream_default,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource())
{
Expand All @@ -630,18 +635,21 @@ std::unique_ptr<table> gather(

for (auto const& source_column : source_table) {
// The data gather for n columns will be put on the first n streams
destination_columns.push_back(cudf::type_dispatcher(source_column.type(),
column_gatherer{},
source_column,
gather_map_begin,
gather_map_end,
nullify_out_of_bounds,
stream,
mr));
destination_columns.push_back(
cudf::type_dispatcher(source_column.type(),
column_gatherer{},
source_column,
gather_map_begin,
gather_map_end,
bounds_policy == out_of_bounds_policy::NULLIFY,
stream,
mr));
}

auto const op =
nullify_out_of_bounds ? gather_bitmask_op::NULLIFY : gather_bitmask_op::DONT_CHECK;
gather_bitmask_op const op = bounds_policy == out_of_bounds_policy::NULLIFY
? gather_bitmask_op::NULLIFY
: gather_bitmask_op::DONT_CHECK;

gather_bitmask(source_table, gather_map_begin, destination_columns, op, stream, mr);

return std::make_unique<table>(std::move(destination_columns));
Expand Down
16 changes: 8 additions & 8 deletions cpp/include/cudf/detail/gather.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -18,16 +18,16 @@
#include <cudf/column/column_view.hpp>
#include <cudf/table/table_view.hpp>

#include <cudf/copying.hpp>
#include <cudf/table/table.hpp>

#include <rmm/cuda_stream_view.hpp>

#include <memory>

namespace cudf {
namespace detail {

enum class out_of_bounds_policy : int8_t { FAIL, NULLIFY, IGNORE };
namespace detail {

enum class negative_index_policy : bool { ALLOWED, NOT_ALLOWED };

Expand All @@ -49,11 +49,11 @@ enum class negative_index_policy : bool { ALLOWED, NOT_ALLOWED };
* @param[in] source_table The input columns whose rows will be gathered
* @param[in] gather_map View into a non-nullable column of integral indices that maps the
* rows in the source columns to rows in the destination columns.
* @param[in] out_of_bounds_policy How to treat out of bounds indices. FAIL: check `gather_map`
* values and throw an exception if any are out of bounds. `NULLIFY` means to nullify output values
* corresponding to out-of-bounds gather_map values. `IGNORE` means to ignore values in
* `gather_map` that are out of bounds. `IGNORE` is incompatible with `negative_index_policy ==
* ALLOW`.
* @param[in] bounds_policy How to treat out-of-bounds indices. `NULLIFY` coerces rows that
* correspond to out-of-bounds indices in the gather map to be null elements. For better
* performance, use `DONT_CHECK` when the `gather_map` is known to contain only valid
* indices. If `policy` is set to `DONT_CHECK` and there are out-of-bounds indices in `gather_map`,
* the behavior is undefined.
* @param[in] negative_index_policy Interpret each negative index `i` in the
* gathermap as the positive index `i+num_source_rows`.
* @param[in] mr Device memory resource used to allocate the returned table's device memory
Expand All @@ -63,7 +63,7 @@ enum class negative_index_policy : bool { ALLOWED, NOT_ALLOWED };
std::unique_ptr<table> gather(
table_view const& source_table,
column_view const& gather_map,
out_of_bounds_policy bounds,
out_of_bounds_policy bounds_policy,
negative_index_policy neg_indices,
rmm::cuda_stream_view stream = rmm::cuda_stream_default,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());
Expand Down
29 changes: 5 additions & 24 deletions cpp/src/copying/gather.cu
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ namespace detail {

std::unique_ptr<table> gather(table_view const& source_table,
column_view const& gather_map,
out_of_bounds_policy bounds,
out_of_bounds_policy bounds_policy,
negative_index_policy neg_indices,
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr)
Expand All @@ -43,19 +43,6 @@ std::unique_ptr<table> gather(table_view const& source_table,
auto map_begin = indexalator_factory::make_input_iterator(gather_map);
auto map_end = map_begin + gather_map.size();

if (bounds == out_of_bounds_policy::FAIL) {
cudf::size_type begin =
neg_indices == negative_index_policy::ALLOWED ? -source_table.num_rows() : 0;
cudf::size_type end = source_table.num_rows();
CUDF_EXPECTS(gather_map.size() == thrust::count_if(rmm::exec_policy(stream)->on(stream.value()),
map_begin,
map_end,
[begin, end] __device__(size_type index) {
return ((index >= begin) && (index < end));
}),
"Index out of bounds.");
}

if (neg_indices == negative_index_policy::ALLOWED) {
cudf::size_type n_rows = source_table.num_rows();
auto idx_converter = [n_rows] __device__(size_type in) {
Expand All @@ -64,19 +51,18 @@ std::unique_ptr<table> gather(table_view const& source_table,
return gather(source_table,
thrust::make_transform_iterator(map_begin, idx_converter),
thrust::make_transform_iterator(map_end, idx_converter),
bounds == out_of_bounds_policy::IGNORE,
bounds_policy,
stream,
mr);
}
return gather(
source_table, map_begin, map_end, bounds == out_of_bounds_policy::IGNORE, stream, mr);
return gather(source_table, map_begin, map_end, bounds_policy, stream, mr);
}

} // namespace detail

std::unique_ptr<table> gather(table_view const& source_table,
column_view const& gather_map,
bool check_bounds,
out_of_bounds_policy bounds_policy,
rmm::mr::device_memory_resource* mr)
{
CUDF_FUNC_RANGE();
Expand All @@ -85,12 +71,7 @@ std::unique_ptr<table> gather(table_view const& source_table,
: detail::negative_index_policy::ALLOWED;

return detail::gather(
source_table,
gather_map,
check_bounds ? detail::out_of_bounds_policy::FAIL : detail::out_of_bounds_policy::NULLIFY,
index_policy,
rmm::cuda_stream_default,
mr);
source_table, gather_map, bounds_policy, index_policy, rmm::cuda_stream_default, mr);
}

} // namespace cudf
4 changes: 2 additions & 2 deletions cpp/src/copying/sample.cu
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ std::unique_ptr<table> sample(table_view const& input,
thrust::make_transform_iterator(thrust::counting_iterator<size_type>(0), RandomGen);
auto end = thrust::make_transform_iterator(thrust::counting_iterator<size_type>(n), RandomGen);

return detail::gather(input, begin, end, false, stream, mr);
return detail::gather(input, begin, end, out_of_bounds_policy::DONT_CHECK, stream, mr);
} else {
auto gather_map = make_numeric_column(
data_type{type_id::INT32}, num_rows, mask_state::UNALLOCATED, stream.value());
Expand All @@ -77,7 +77,7 @@ std::unique_ptr<table> sample(table_view const& input,
return detail::gather(input,
gather_map_view.begin<size_type>(),
gather_map_view.end<size_type>(),
false,
out_of_bounds_policy::DONT_CHECK,
stream,
mr);
}
Expand Down
2 changes: 1 addition & 1 deletion cpp/src/dictionary/add_keys.cu
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ std::unique_ptr<column> add_keys(
// and the corresponding index is therefore invalid/undefined
auto table_indices = cudf::detail::gather(table_view{{map_indices->view()}},
indices_view,
cudf::detail::out_of_bounds_policy::IGNORE,
cudf::out_of_bounds_policy::NULLIFY,
cudf::detail::negative_index_policy::NOT_ALLOWED,
stream,
mr)
Expand Down
2 changes: 1 addition & 1 deletion cpp/src/dictionary/decode.cu
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ std::unique_ptr<column> decode(dictionary_column_view const& source,
// use gather to create the output column -- use ignore_out_of_bounds=true
auto table_column = cudf::detail::gather(table_view{{source.keys()}},
indices,
cudf::detail::out_of_bounds_policy::IGNORE,
cudf::out_of_bounds_policy::NULLIFY,
cudf::detail::negative_index_policy::NOT_ALLOWED,
stream,
mr)
Expand Down
3 changes: 2 additions & 1 deletion cpp/src/dictionary/remove_keys.cu
Original file line number Diff line number Diff line change
Expand Up @@ -114,12 +114,13 @@ std::unique_ptr<column> remove_keys_fn(
// Example: gather([0,max,1,max,2],[4,0,3,1,2,2,2,4,0]) => [2,0,max,max,1,1,1,2,0]
auto table_indices = cudf::detail::gather(table_view{{map_indices->view()}},
indices_view,
cudf::detail::out_of_bounds_policy::NULLIFY,
cudf::out_of_bounds_policy::NULLIFY,
cudf::detail::negative_index_policy::NOT_ALLOWED,
stream,
mr)
->release();
std::unique_ptr<column> indices_column(std::move(table_indices.front()));
indices_column->set_null_mask(rmm::device_buffer{}, 0);

// compute new nulls -- merge the existing nulls with the newly created ones (value<0)
auto const offset = dictionary_column.offset();
Expand Down
5 changes: 3 additions & 2 deletions cpp/src/filling/repeat.cu
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,8 @@ std::unique_ptr<table> repeat(table_view const& input_table,
thrust::make_counting_iterator(output_size),
indices.begin());

return gather(input_table, indices.begin(), indices.end(), false, stream, mr);
return gather(
input_table, indices.begin(), indices.end(), out_of_bounds_policy::DONT_CHECK, stream, mr);
}

std::unique_ptr<table> repeat(table_view const& input_table,
Expand All @@ -155,7 +156,7 @@ std::unique_ptr<table> repeat(table_view const& input_table,
thrust::make_counting_iterator(0), [count] __device__(auto i) { return i / count; });
auto map_end = map_begin + output_size;

return gather(input_table, map_begin, map_end, false, stream, mr);
return gather(input_table, map_begin, map_end, out_of_bounds_policy::DONT_CHECK, stream, mr);
}

} // namespace detail
Expand Down
2 changes: 1 addition & 1 deletion cpp/src/groupby/groupby.cu
Original file line number Diff line number Diff line change
Expand Up @@ -149,7 +149,7 @@ groupby::groups groupby::get_groups(table_view values, rmm::mr::device_memory_re
if (values.num_columns()) {
grouped_values = cudf::detail::gather(values,
helper().key_sort_order(),
cudf::detail::out_of_bounds_policy::NULLIFY,
cudf::out_of_bounds_policy::DONT_CHECK,
cudf::detail::negative_index_policy::NOT_ALLOWED,
rmm::cuda_stream_default,
mr);
Expand Down
21 changes: 15 additions & 6 deletions cpp/src/groupby/hash/groupby.cu
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
#include <cudf/column/column.hpp>
#include <cudf/column/column_factories.hpp>
#include <cudf/column/column_view.hpp>
#include <cudf/copying.hpp>
#include <cudf/detail/aggregation/aggregation.cuh>
#include <cudf/detail/aggregation/aggregation.hpp>
#include <cudf/detail/aggregation/result_cache.hpp>
Expand Down Expand Up @@ -139,8 +140,12 @@ class hash_compound_agg_finalizer final : public cudf::detail::aggregation_final
auto to_dense_agg_result(cudf::aggregation const& agg)
{
auto s = sparse_results->get_result(col_idx, agg);
auto dense_result_table = cudf::detail::gather(
table_view({s}), gather_map.begin(), gather_map.begin() + map_size, false, stream, mr);
auto dense_result_table = cudf::detail::gather(table_view({s}),
gather_map.begin(),
gather_map.begin() + map_size,
out_of_bounds_policy::DONT_CHECK,
stream,
mr);
return std::move(dense_result_table->release()[0]);
}

Expand All @@ -161,8 +166,8 @@ class hash_compound_agg_finalizer final : public cudf::detail::aggregation_final
auto gather_argminmax =
cudf::detail::gather(table_view({col}),
null_removed_map,
arg_result->nullable() ? cudf::detail::out_of_bounds_policy::IGNORE
: cudf::detail::out_of_bounds_policy::NULLIFY,
arg_result->nullable() ? cudf::out_of_bounds_policy::NULLIFY
: cudf::out_of_bounds_policy::DONT_CHECK,
cudf::detail::negative_index_policy::NOT_ALLOWED,
stream,
mr);
Expand Down Expand Up @@ -538,8 +543,12 @@ std::unique_ptr<table> groupby_null_templated(table_view const& keys,
stream,
mr);

return cudf::detail::gather(
keys, gather_map.begin(), gather_map.begin() + map_size, false, stream, mr);
return cudf::detail::gather(keys,
gather_map.begin(),
gather_map.begin() + map_size,
out_of_bounds_policy::DONT_CHECK,
stream,
mr);
}

} // namespace
Expand Down
4 changes: 2 additions & 2 deletions cpp/src/groupby/sort/group_argmax.cu
Original file line number Diff line number Diff line change
Expand Up @@ -54,8 +54,8 @@ std::unique_ptr<column> group_argmax(column_view const& values,
auto result_table =
cudf::detail::gather(table_view({key_sort_order}),
null_removed_indices,
indices->nullable() ? cudf::detail::out_of_bounds_policy::IGNORE
: cudf::detail::out_of_bounds_policy::NULLIFY,
indices->nullable() ? cudf::out_of_bounds_policy::NULLIFY
: cudf::out_of_bounds_policy::DONT_CHECK,
cudf::detail::negative_index_policy::NOT_ALLOWED,
stream,
mr);
Expand Down
Loading