Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make all headers self-contained #141

Merged
merged 10 commits into from
May 19, 2022
9 changes: 6 additions & 3 deletions benchmarks/hash_table/dynamic_map_bench.cu
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2020, NVIDIA CORPORATION.
* Copyright (c) 2020-2022, NVIDIA CORPORATION.
PointKernel marked this conversation as resolved.
Show resolved Hide resolved
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -14,11 +14,14 @@
* limitations under the License.
*/

#include <benchmark/benchmark.h>
#include <synchronization.hpp>

#include <cuco/dynamic_map.cuh>

#include <benchmark/benchmark.h>

#include <iostream>
#include <random>
Comment on lines +17 to 24
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a permutation of the same includes. I think clang_format is already sorting these in lexicographic order for us if they are defined in a contiguous block, which is nice and consistent. However, lexicographic order might not always be correct or desired. Maybe we can customize the clang_format file to always produce an ordering that satisfies our needs and then just let the CI take care of it.

Copy link
Member Author

@PointKernel PointKernel May 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea here is to separate different header groups and order them from "near" to "far". e.g. synchronization.hpp is a bench-local header thus it's placed before the library header cuco/dynamic_map.cuh. gbench header is even further but considered closer than std headers.

This grouping method seems a bit awkward with only one file in each group but will show its advantage with more headers involved.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, this makes sense. I wonder however if we should add a CI script for that. Basically extract all includes, check in the include tree where this file originates, group and then reorder.

#include <synchronization.hpp>

enum class dist_type { UNIQUE, UNIFORM, GAUSSIAN };

Expand Down
64 changes: 57 additions & 7 deletions benchmarks/hash_table/static_map_bench.cu
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2020, NVIDIA CORPORATION.
* Copyright (c) 2020-2022, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -14,13 +14,16 @@
* limitations under the License.
*/

#include "cuco/static_map.cuh"
#include <cuco/static_map.cuh>

#include <thrust/device_vector.h>
#include <thrust/for_each.h>

#include <benchmark/benchmark.h>

#include <fstream>
#include <iostream>
#include <random>
#include <thrust/device_vector.h>
#include <thrust/for_each.h>

enum class dist_type { UNIQUE, UNIFORM, GAUSSIAN };

Expand Down Expand Up @@ -145,6 +148,9 @@ static void BM_static_map_search_all(::benchmark::State& state)

for (auto _ : state) {
map.find(d_keys.begin(), d_keys.end(), d_results.begin());
// TODO: get rid of sync and rewrite the benchmark with `nvbench`
// once https://github.com/NVIDIA/nvbench/pull/80 is merged
cudaDeviceSynchronize();
}

state.SetBytesProcessed((sizeof(Key) + sizeof(Value)) * int64_t(state.iterations()) *
Expand Down Expand Up @@ -202,11 +208,55 @@ BENCHMARK_TEMPLATE(BM_static_map_insert, int32_t, int32_t, dist_type::UNIQUE)
->Apply(generate_size_and_occupancy)
->UseManualTime();

BENCHMARK_TEMPLATE(BM_static_map_erase_all, int32_t, int32_t, dist_type::UNIQUE)
BENCHMARK_TEMPLATE(BM_static_map_search_all, int32_t, int32_t, dist_type::UNIQUE)
PointKernel marked this conversation as resolved.
Show resolved Hide resolved
->Unit(benchmark::kMillisecond)
->Apply(generate_size_and_occupancy);

BENCHMARK_TEMPLATE(BM_static_map_insert, int32_t, int32_t, dist_type::UNIQUE)
BENCHMARK_TEMPLATE(BM_static_map_insert, int32_t, int32_t, dist_type::UNIFORM)
->Unit(benchmark::kMillisecond)
->Apply(generate_size_and_occupancy)
->UseManualTime();

BENCHMARK_TEMPLATE(BM_static_map_search_all, int32_t, int32_t, dist_type::UNIFORM)
->Unit(benchmark::kMillisecond)
->Apply(generate_size_and_occupancy);

BENCHMARK_TEMPLATE(BM_static_map_insert, int32_t, int32_t, dist_type::GAUSSIAN)
->Unit(benchmark::kMillisecond)
->Apply(generate_size_and_occupancy)
->UseManualTime();

BENCHMARK_TEMPLATE(BM_static_map_search_all, int32_t, int32_t, dist_type::GAUSSIAN)
->Unit(benchmark::kMillisecond)
->Apply(generate_size_and_occupancy);

BENCHMARK_TEMPLATE(BM_static_map_insert, int64_t, int64_t, dist_type::UNIQUE)
->Unit(benchmark::kMillisecond)
->Apply(generate_size_and_occupancy)
->UseManualTime();
->UseManualTime();

BENCHMARK_TEMPLATE(BM_static_map_search_all, int64_t, int64_t, dist_type::UNIQUE)
->Unit(benchmark::kMillisecond)
->Apply(generate_size_and_occupancy);

BENCHMARK_TEMPLATE(BM_static_map_insert, int64_t, int64_t, dist_type::UNIFORM)
->Unit(benchmark::kMillisecond)
->Apply(generate_size_and_occupancy)
->UseManualTime();

BENCHMARK_TEMPLATE(BM_static_map_search_all, int64_t, int64_t, dist_type::UNIFORM)
->Unit(benchmark::kMillisecond)
->Apply(generate_size_and_occupancy);

BENCHMARK_TEMPLATE(BM_static_map_insert, int64_t, int64_t, dist_type::GAUSSIAN)
->Unit(benchmark::kMillisecond)
->Apply(generate_size_and_occupancy)
->UseManualTime();

BENCHMARK_TEMPLATE(BM_static_map_search_all, int64_t, int64_t, dist_type::GAUSSIAN)
->Unit(benchmark::kMillisecond)
->Apply(generate_size_and_occupancy);

BENCHMARK_TEMPLATE(BM_static_map_erase_all, int32_t, int32_t, dist_type::UNIQUE)
->Unit(benchmark::kMillisecond)
->Apply(generate_size_and_occupancy);
10 changes: 5 additions & 5 deletions benchmarks/hash_table/static_multimap/count_bench.cu
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2021, NVIDIA CORPORATION.
* Copyright (c) 2021-2022, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -14,13 +14,13 @@
* limitations under the License.
*/

#include <random>
PointKernel marked this conversation as resolved.
Show resolved Hide resolved
#include <key_generator.hpp>

#include <cuco/static_multimap.cuh>

#include <nvbench/nvbench.cuh>
#include <thrust/device_vector.h>

#include <cuco/static_multimap.cuh>
#include <key_generator.hpp>
#include <thrust/device_vector.h>

/**
* @brief A benchmark evaluating multi-value `count` performance:
Expand Down
10 changes: 5 additions & 5 deletions benchmarks/hash_table/static_multimap/insert_bench.cu
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2021, NVIDIA CORPORATION.
* Copyright (c) 2021-2022, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -14,13 +14,13 @@
* limitations under the License.
*/

#include <random>
#include <key_generator.hpp>

#include <cuco/static_multimap.cuh>

#include <nvbench/nvbench.cuh>
#include <thrust/device_vector.h>

#include <cuco/static_multimap.cuh>
#include <key_generator.hpp>
#include <thrust/device_vector.h>

/**
* @brief A benchmark evaluating multi-value `insert` performance:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2021, NVIDIA CORPORATION.
* Copyright (c) 2021-2022, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -14,13 +14,12 @@
* limitations under the License.
*/

#include <cuco/static_multimap.cuh>

#include <nvbench/nvbench.cuh>

#include <random>
#include <thrust/device_vector.h>

#include "cuco/static_multimap.cuh"

/**
* @brief Generates input keys by a given number of repetitions per key.
*
Expand Down
10 changes: 5 additions & 5 deletions benchmarks/hash_table/static_multimap/pair_retrieve_bench.cu
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2021, NVIDIA CORPORATION.
* Copyright (c) 2021-2022, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -14,15 +14,15 @@
* limitations under the License.
*/

#include <random>
#include <key_generator.hpp>

#include <cuco/static_multimap.cuh>

#include <nvbench/nvbench.cuh>

#include <thrust/device_vector.h>
#include <thrust/iterator/discard_iterator.h>

#include <cuco/static_multimap.cuh>
#include <key_generator.hpp>

namespace {
// Custom pair equal
template <typename Key, typename Value>
Expand Down
10 changes: 5 additions & 5 deletions benchmarks/hash_table/static_multimap/query_bench.cu
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2021, NVIDIA CORPORATION.
* Copyright (c) 2021-2022, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -14,13 +14,13 @@
* limitations under the License.
*/

#include <random>
#include <key_generator.hpp>

#include <cuco/static_multimap.cuh>

#include <nvbench/nvbench.cuh>
#include <thrust/device_vector.h>

#include <cuco/static_multimap.cuh>
#include <key_generator.hpp>
#include <thrust/device_vector.h>

/**
* @brief A benchmark evaluating multi-value query (`count` + `retrieve`) performance:
Expand Down
7 changes: 3 additions & 4 deletions benchmarks/hash_table/static_multimap/retrieve_bench.cu
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2021, NVIDIA CORPORATION.
* Copyright (c) 2021-2022, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -14,14 +14,13 @@
* limitations under the License.
*/

#include <cuco/static_multimap.cuh>
#include <key_generator.hpp>

#include <thrust/device_vector.h>
#include <cuco/static_multimap.cuh>

#include <nvbench/nvbench.cuh>

#include <random>
#include <thrust/device_vector.h>

/**
* @brief A benchmark evaluating multi-value `retrieve` performance:
Expand Down
5 changes: 3 additions & 2 deletions benchmarks/synchronization.hpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2020, NVIDIA CORPORATION.
* Copyright (c) 2020-2022, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -18,6 +18,7 @@

// Google Benchmark library
#include <benchmark/benchmark.h>

#include <cuda_runtime_api.h>

#define BENCH_CUDA_TRY(call) \
Expand Down Expand Up @@ -129,4 +130,4 @@ class cuda_event_timer {
cudaEvent_t stop_;
cudaStream_t stream_;
benchmark::State* p_state;
};
};
6 changes: 3 additions & 3 deletions examples/static_map/custom_type_example.cu
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2021, NVIDIA CORPORATION.
* Copyright (c) 2021-2022, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -14,12 +14,12 @@
* limitations under the License.
*/

#include <cuco/static_map.cuh>

#include <thrust/device_vector.h>
#include <thrust/logical.h>
#include <thrust/transform.h>

#include <cuco/static_map.cuh>

// User-defined key type
#ifdef CUCO_NO_INDEPENDENT_THREADS
struct custom_key_type {
Expand Down
6 changes: 3 additions & 3 deletions examples/static_map/static_map_example.cu
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2020, NVIDIA CORPORATION.
* Copyright (c) 2020-2022, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -14,13 +14,13 @@
* limitations under the License.
*/

#include <limits>
#include <cuco/static_map.cuh>

#include <thrust/device_vector.h>
#include <thrust/sequence.h>
#include <thrust/transform.h>

#include <cuco/static_map.cuh>
#include <limits>

int main(void)
{
Expand Down
6 changes: 3 additions & 3 deletions examples/static_multimap/static_multimap_example.cu
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2021, NVIDIA CORPORATION.
* Copyright (c) 2021-2022, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -14,13 +14,13 @@
* limitations under the License.
*/

#include <limits>
#include <cuco/static_multimap.cuh>

#include <thrust/device_vector.h>
#include <thrust/sequence.h>
#include <thrust/transform.h>

#include <cuco/static_multimap.cuh>
#include <limits>

int main(void)
{
Expand Down
10 changes: 8 additions & 2 deletions include/cuco/detail/dynamic_map_kernels.cuh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2020, NVIDIA CORPORATION.
* Copyright (c) 2020-2022, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -14,6 +14,12 @@
* limitations under the License.
*/

#include <cub/cub.cuh>
PointKernel marked this conversation as resolved.
Show resolved Hide resolved

#include <cuda/std/atomic>

#include <cooperative_groups.h>

namespace cuco {
namespace detail {
namespace cg = cooperative_groups;
Expand Down Expand Up @@ -457,4 +463,4 @@ __global__ void contains(InputIt first,
}
}
} // namespace detail
} // namespace cuco
} // namespace cuco
1 change: 1 addition & 0 deletions include/cuco/detail/error.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
#pragma once

#include <cuda_runtime_api.h>

#include <stdexcept>
#include <string>

Expand Down
Loading