Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize compaction operations #10030

Merged
merged 55 commits into from
Feb 2, 2022
Merged
Show file tree
Hide file tree
Changes from 41 commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
2f33e04
Rename existing compaction APIs
PointKernel Jan 12, 2022
3212706
Merge remote-tracking branch 'upstream/branch-22.02' into optimize-co…
PointKernel Jan 12, 2022
7ce9549
Update cython code to accommodate renaming
PointKernel Jan 12, 2022
5c5a415
Update copyrights
PointKernel Jan 12, 2022
0a62ade
Refactor unordered_distinct_count with hash-based algorithms
PointKernel Jan 12, 2022
e6acd8f
Merge remote-tracking branch 'upstream/branch-22.02' into optimize-co…
PointKernel Jan 12, 2022
fba851c
Refactor unordered_drop_duplicates with hash-based algorithms
PointKernel Jan 13, 2022
05ee85f
Update cython code
PointKernel Jan 13, 2022
8ab22a4
Optimize distinct count: insert valid rows only if nulls are equal
PointKernel Jan 13, 2022
bba7b57
Fill column via mutable view + update comments
PointKernel Jan 13, 2022
6746f28
Minor corrections
PointKernel Jan 13, 2022
70292bc
Update benchmarks and unit tests
PointKernel Jan 13, 2022
46d83b9
Add reminder for further optimization in distinct count
PointKernel Jan 13, 2022
f07d3d0
Fix transform test failure
PointKernel Jan 13, 2022
8fcfbae
Fix dictionary test failures
PointKernel Jan 13, 2022
a8fe478
Merge remote-tracking branch 'upstream/branch-22.04' into optimize-co…
PointKernel Jan 13, 2022
f2ac25d
Add sort-based implementations back to the repo
PointKernel Jan 14, 2022
0ed5712
Update copyright
PointKernel Jan 14, 2022
e372144
Add consecutive distinct_count
PointKernel Jan 14, 2022
fc57b29
Remove nan control in distinct_count
PointKernel Jan 15, 2022
d810e0b
Update unit tests
PointKernel Jan 15, 2022
40cc410
Add nan handling to distinct_count + update unit tests
PointKernel Jan 17, 2022
dd91e64
Rename drop_duplicates as sort_and_drop_duplicates
PointKernel Jan 17, 2022
d80911c
Add consecutive drop_duplicates
PointKernel Jan 17, 2022
7ced995
Optimize unordered_distinct_count: insert non-null rows only to impro…
PointKernel Jan 17, 2022
2ea5d8e
Update cuco git tag
PointKernel Jan 17, 2022
4bb7b16
Slience unused argument warning via function prototyping
PointKernel Jan 18, 2022
012ca8b
Refactor compaction benchmark with nvbench
PointKernel Jan 18, 2022
d489e2e
Update copyright
PointKernel Jan 18, 2022
3e47ffd
Get rid of nvbench primitive types
PointKernel Jan 19, 2022
a0a10e5
Update docs & comments
PointKernel Jan 19, 2022
5fb92c7
Address review comments
PointKernel Jan 19, 2022
3af4fd0
Address more review comments
PointKernel Jan 19, 2022
a587511
Split tests
PointKernel Jan 19, 2022
b062eb5
Use null masks in tests
PointKernel Jan 19, 2022
a5f881f
Split benchmarks
PointKernel Jan 19, 2022
df36e77
Fix a bug + update tests
PointKernel Jan 21, 2022
20ed6ea
Update docs
PointKernel Jan 21, 2022
e401690
Merge remote-tracking branch 'upstream/branch-22.04' into optimize-co…
PointKernel Jan 21, 2022
ecc1d7e
Add should_check_nan predicate to avoid unnecessary type-dispatching
PointKernel Jan 21, 2022
a151443
Rename benchmark according to benchmarking guide
PointKernel Jan 21, 2022
024d7e0
Remove std::unique-like drop_duplicates
PointKernel Jan 24, 2022
b6c1634
Style fixing
PointKernel Jan 24, 2022
3ad0f76
Fix test failures: sort the output
PointKernel Jan 24, 2022
58f6cb6
Minor cleanups
PointKernel Jan 24, 2022
e381815
Minor cleanup
PointKernel Jan 24, 2022
fa796aa
Address review comments
PointKernel Jan 24, 2022
3915134
Merge remote-tracking branch 'upstream/branch-22.04' into optimize-co…
PointKernel Jan 25, 2022
118468e
Address review comments
PointKernel Jan 27, 2022
d1535d5
Simply if logic
PointKernel Jan 27, 2022
0b0d015
Minor updates
PointKernel Jan 27, 2022
906f469
Add early exit
PointKernel Jan 27, 2022
c8a3e87
Fix cuco pair issues with the latest cuco tag
PointKernel Jan 28, 2022
070d5ce
Address review comments
PointKernel Feb 2, 2022
a60c128
Address review + update comments
PointKernel Feb 2, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions cpp/benchmarks/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# =============================================================================
# Copyright (c) 2018-2021, NVIDIA CORPORATION.
# Copyright (c) 2018-2022, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except
# in compliance with the License. You may obtain a copy of the License at
Expand Down Expand Up @@ -123,7 +123,7 @@ ConfigureBench(APPLY_BOOLEAN_MASK_BENCH stream_compaction/apply_boolean_mask_ben

# ##################################################################################################
# * stream_compaction benchmark -------------------------------------------------------------------
ConfigureBench(STREAM_COMPACTION_BENCH stream_compaction/drop_duplicates_benchmark.cpp)
ConfigureNVBench(STREAM_COMPACTION_BENCH stream_compaction/drop_duplicates_benchmark.cpp)

# ##################################################################################################
# * join benchmark --------------------------------------------------------------------------------
Expand Down
145 changes: 105 additions & 40 deletions cpp/benchmarks/stream_compaction/drop_duplicates_benchmark.cpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2020, NVIDIA CORPORATION.
* Copyright (c) 2022, NVIDIA CORPORATION.
PointKernel marked this conversation as resolved.
Show resolved Hide resolved
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -15,64 +15,129 @@
*/

#include <cudf/column/column_view.hpp>
#include <cudf/stream_compaction.hpp>
#include <cudf/detail/stream_compaction.hpp>
#include <cudf/types.hpp>
#include <cudf_test/base_fixture.hpp>
#include <cudf_test/column_wrapper.hpp>
#include <fixture/benchmark_fixture.hpp>
#include <synchronization/synchronization.hpp>

#include <fixture/rmm_pool_raii.hpp>

#include <nvbench/nvbench.cuh>

#include <memory>
#include <random>

class Compaction : public cudf::benchmark {
};
enum class algorithm { SORT_BASED, HASH_BASED };

// mandatory for enum types
NVBENCH_DECLARE_ENUM_TYPE_STRINGS(
// Enum type:
algorithm,
// Callable to generate input strings:
// Short identifier used for tables, command-line args, etc.
// Used when context is available to figure out the enum type.
[](algorithm algo) {
switch (algo) {
case algorithm::SORT_BASED: return "SORT_BASED";
case algorithm::HASH_BASED: return "HASH_BASED";
default: return "ERROR";
}
},
// Callable to generate descriptions:
// If non-empty, these are used in `--list` to describe values.
// Used when context may not be available to figure out the type from the
// input string.
// Just use `[](auto) { return std::string{}; }` if you don't want these.
[](auto) { return std::string{}; })
PointKernel marked this conversation as resolved.
Show resolved Hide resolved

NVBENCH_DECLARE_ENUM_TYPE_STRINGS(
// Enum type:
cudf::duplicate_keep_option,
// Callable to generate input strings:
// Short identifier used for tables, command-line args, etc.
// Used when context is available to figure out the enum type.
[](cudf::duplicate_keep_option option) {
switch (option) {
case cudf::duplicate_keep_option::KEEP_FIRST: return "KEEP_FIRST";
case cudf::duplicate_keep_option::KEEP_LAST: return "KEEP_LAST";
case cudf::duplicate_keep_option::KEEP_NONE: return "KEEP_NONE";
default: return "ERROR";
}
},
// Callable to generate descriptions:
// If non-empty, these are used in `--list` to describe values.
// Used when context may not be available to figure out the type from the
// input string.
// Just use `[](auto) { return std::string{}; }` if you don't want these.
[](auto) { return std::string{}; })

NVBENCH_DECLARE_TYPE_STRINGS(cudf::timestamp_ms, "cudf::timestamp_ms", "cudf::timestamp_ms");

template <typename Type, cudf::duplicate_keep_option Keep>
void nvbench_sort_and_drop_duplicates(nvbench::state& state,
nvbench::type_list<Type, nvbench::enum_type<Keep>>)
{
if constexpr (not std::is_same_v<Type, int32_t> and
Keep != cudf::duplicate_keep_option::KEEP_FIRST) {
state.skip("Skip unwanted benchmarks.");
}

cudf::rmm_pool_raii pool_raii;

auto const num_rows = state.get_int64("NumRows");

cudf::test::UniformRandomGenerator<long> rand_gen(0, 100);
auto elements = cudf::detail::make_counting_transform_iterator(
0, [&rand_gen](auto row) { return rand_gen.generate(); });
auto valids = cudf::detail::make_counting_transform_iterator(
0, [](auto i) { return i % 100 == 0 ? false : true; });
cudf::test::fixed_width_column_wrapper<Type, long> values(elements, elements + num_rows, valids);

auto input_column = cudf::column_view(values);
auto input_table = cudf::table_view({input_column, input_column, input_column, input_column});

state.exec(nvbench::exec_tag::sync, [&](nvbench::launch& launch) {
rmm::cuda_stream_view stream_view{launch.get_stream()};
auto result = cudf::detail::sort_and_drop_duplicates(
input_table, {0}, Keep, cudf::null_equality::EQUAL, cudf::null_order::BEFORE, stream_view);
});
}

template <typename Type>
void BM_compaction(benchmark::State& state, cudf::duplicate_keep_option keep)
void nvbench_unordered_drop_duplicates(nvbench::state& state, nvbench::type_list<Type>)
{
auto const n_rows = static_cast<cudf::size_type>(state.range(0));
cudf::rmm_pool_raii pool_raii;

auto const num_rows = state.get_int64("NumRows");

cudf::test::UniformRandomGenerator<long> rand_gen(0, 100);
auto elements = cudf::detail::make_counting_transform_iterator(
0, [&rand_gen](auto row) { return rand_gen.generate(); });
auto valids = cudf::detail::make_counting_transform_iterator(
0, [](auto i) { return i % 100 == 0 ? false : true; });
cudf::test::fixed_width_column_wrapper<Type, long> values(elements, elements + n_rows, valids);
cudf::test::fixed_width_column_wrapper<Type, long> values(elements, elements + num_rows, valids);

auto input_column = cudf::column_view(values);
auto input_table = cudf::table_view({input_column, input_column, input_column, input_column});

for (auto _ : state) {
cuda_event_timer timer(state, true);
auto result = cudf::drop_duplicates(input_table, {0}, keep);
}
state.exec(nvbench::exec_tag::sync, [&](nvbench::launch& launch) {
rmm::cuda_stream_view stream_view{launch.get_stream()};
auto result = cudf::detail::unordered_drop_duplicates(
input_table, {0}, cudf::null_equality::EQUAL, stream_view);
});
}

#define concat(a, b, c) a##b##c
#define get_keep(op) cudf::duplicate_keep_option::KEEP_##op

// TYPE, OP
#define RBM_BENCHMARK_DEFINE(name, type, keep) \
BENCHMARK_DEFINE_F(Compaction, name)(::benchmark::State & state) \
{ \
BM_compaction<type>(state, get_keep(keep)); \
} \
BENCHMARK_REGISTER_F(Compaction, name) \
->UseManualTime() \
->Arg(10000) /* 10k */ \
->Arg(100000) /* 100k */ \
->Arg(1000000) /* 1M */ \
->Arg(10000000) /* 10M */

#define COMPACTION_BENCHMARK_DEFINE(type, keep) \
RBM_BENCHMARK_DEFINE(concat(type, _, keep), type, keep)

COMPACTION_BENCHMARK_DEFINE(bool, NONE);
COMPACTION_BENCHMARK_DEFINE(int8_t, NONE);
COMPACTION_BENCHMARK_DEFINE(int32_t, NONE);
COMPACTION_BENCHMARK_DEFINE(int32_t, FIRST);
COMPACTION_BENCHMARK_DEFINE(int32_t, LAST);
using cudf::timestamp_ms;
COMPACTION_BENCHMARK_DEFINE(timestamp_ms, NONE);
COMPACTION_BENCHMARK_DEFINE(float, NONE);
using data_type = nvbench::type_list<bool, int8_t, int32_t, int64_t, float, cudf::timestamp_ms>;
using keep_option = nvbench::enum_type_list<cudf::duplicate_keep_option::KEEP_FIRST,
cudf::duplicate_keep_option::KEEP_LAST,
cudf::duplicate_keep_option::KEEP_NONE>;

NVBENCH_BENCH_TYPES(nvbench_sort_and_drop_duplicates, NVBENCH_TYPE_AXES(data_type, keep_option))
.set_name("sort_and_drop_duplicates")
.set_type_axes_names({"Type", "KeepOption"})
.add_int64_axis("NumRows", {10'000, 100'000, 1'000'000, 10'000'000});

NVBENCH_BENCH_TYPES(nvbench_unordered_drop_duplicates, NVBENCH_TYPE_AXES(data_type))
.set_name("unordered_drop_duplicates")
.set_type_axes_names({"Type"})
.add_int64_axis("NumRows", {10'000, 100'000, 1'000'000, 10'000'000});
2 changes: 1 addition & 1 deletion cpp/cmake/thirdparty/get_cucollections.cmake
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# =============================================================================
# Copyright (c) 2021, NVIDIA CORPORATION.
# Copyright (c) 2021-2022, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except
# in compliance with the License. You may obtain a copy of the License at
Expand Down
46 changes: 45 additions & 1 deletion cpp/include/cudf/detail/stream_compaction.hpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2019-2020, NVIDIA CORPORATION.
* Copyright (c) 2019-2022, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -67,6 +67,19 @@ std::unique_ptr<table> apply_boolean_mask(
* @param[in] stream CUDA stream used for device memory operations and kernel launches.
*/
std::unique_ptr<table> drop_duplicates(
table_view const& input,
std::vector<size_type> const& keys,
duplicate_keep_option keep,
null_equality nulls_equal = null_equality::EQUAL,
rmm::cuda_stream_view stream = rmm::cuda_stream_default,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
* @copydoc cudf::sort_and_drop_duplicates
*
* @param[in] stream CUDA stream used for device memory operations and kernel launches.
*/
std::unique_ptr<table> sort_and_drop_duplicates(
table_view const& input,
std::vector<size_type> const& keys,
duplicate_keep_option keep,
Expand All @@ -75,6 +88,18 @@ std::unique_ptr<table> drop_duplicates(
rmm::cuda_stream_view stream = rmm::cuda_stream_default,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
* @copydoc cudf::unordered_drop_duplicates
*
* @param[in] stream CUDA stream used for device memory operations and kernel launches.
*/
std::unique_ptr<table> unordered_drop_duplicates(
table_view const& input,
std::vector<size_type> const& keys,
null_equality nulls_equal = null_equality::EQUAL,
rmm::cuda_stream_view stream = rmm::cuda_stream_default,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
* @copydoc cudf::distinct_count(column_view const&, null_policy, nan_policy)
*
Expand All @@ -94,5 +119,24 @@ cudf::size_type distinct_count(table_view const& input,
null_equality nulls_equal = null_equality::EQUAL,
rmm::cuda_stream_view stream = rmm::cuda_stream_default);

/**
* @copydoc cudf::unordered_distinct_count(column_view const&, null_policy, nan_policy)
*
* @param[in] stream CUDA stream used for device memory operations and kernel launches.
*/
cudf::size_type unordered_distinct_count(column_view const& input,
null_policy null_handling,
nan_policy nan_handling,
rmm::cuda_stream_view stream = rmm::cuda_stream_default);

/**
* @copydoc cudf::unordered_distinct_count(table_view const&, null_equality)
*
* @param[in] stream CUDA stream used for device memory operations and kernel launches.
*/
cudf::size_type unordered_distinct_count(table_view const& input,
null_equality nulls_equal = null_equality::EQUAL,
rmm::cuda_stream_view stream = rmm::cuda_stream_default);

} // namespace detail
} // namespace cudf
Loading