Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cuco::static_multimap #85

Merged
merged 271 commits into from
Nov 1, 2021
Merged
Show file tree
Hide file tree
Changes from 250 commits
Commits
Show all changes
271 commits
Select commit Hold shift + click to select a range
901b180
Add host count function
PointKernel Mar 5, 2021
0a92baf
Take begin_ as a member variable instead of the entire view
PointKernel Mar 7, 2021
c03b768
Add host count unit test and count global function + minor updates
PointKernel Mar 7, 2021
dacf459
Add a new copy constructor to cuco::pair
PointKernel Mar 10, 2021
a1531e1
Add bulk find_all function and its unit test
PointKernel Mar 10, 2021
77fb776
Add nvbench package in CMake
PointKernel Mar 10, 2021
af20d23
Minor cleanups in static-map-bench
PointKernel Mar 10, 2021
bf842e1
Add static multimap nvbenchmark
PointKernel Mar 12, 2021
c610e11
Fix the occupancy bug in static multimap benchmark
PointKernel Mar 12, 2021
0d1af85
Add int64_t into static multimap benchmark
PointKernel Mar 12, 2021
4303e26
Update static multimap benchmark
PointKernel Mar 16, 2021
5d96e9a
Add a simple notebook for performance analysis
Mar 16, 2021
c1c0bc9
Clean up multimap single insertion benchmark
PointKernel Mar 17, 2021
272efe5
Add multi-value insertion benchmark
PointKernel Mar 17, 2021
4116b1a
Add bandwidth plot in notebook + multi-value insertion analysis
PointKernel Mar 17, 2021
cdd07ac
Add find & find_all benchmarks for staic multimap
PointKernel Mar 19, 2021
7a2921d
Fix the warp illegal address bug in find_all benchmark + adjust find_…
PointKernel Mar 19, 2021
0e815c2
Minor correction: adjust find_all API in static multimap test
PointKernel Mar 19, 2021
a928796
Add complete static multimap benchmark data + update notebook
PointKernel Mar 22, 2021
c1c1d96
Fix an index-related bug + update benchmark & unit test for static mu…
PointKernel Mar 22, 2021
d82a132
Remove data folder from git repo
PointKernel Mar 23, 2021
41eedb1
Add multi-insertion CG size benchmark
PointKernel Mar 24, 2021
3889d19
Update notebook: add multimap CG size analysis + update comments
PointKernel Mar 24, 2021
59c2342
Use per block output cache instead of global atomic writing
PointKernel Mar 29, 2021
8d8ab41
Pass output buffer size as template parameter
PointKernel Mar 29, 2021
cc1e862
Create a folder for static multimap benchmarks + add find_all-specifi…
PointKernel Mar 29, 2021
a5a3374
Fix memory leak bugs
PointKernel Mar 29, 2021
07b25ff
Fix the illegal memory access bug related to cache flushing
PointKernel Mar 30, 2021
d8244e9
Parallel retrieval in find_all
PointKernel Apr 6, 2021
161f77d
Use reinterpret_cast instead of atomic loads (temporary unsafe solution)
PointKernel Apr 7, 2021
1f79fbe
Fix a bug: guard key_equal invocation against the empty key sentinel
PointKernel Apr 8, 2021
a3bfca3
Add compute_prime function for double hashing
PointKernel Apr 8, 2021
2db4cb8
Use double hashing in multimap instead of linear probing
PointKernel Apr 8, 2021
77b952c
Add per CG shared memory buffer for find_all
PointKernel Apr 9, 2021
8386d94
Fix a bug: add an offset to hash2 function
PointKernel Apr 12, 2021
aeeb88b
Update multimap count function without using fancy iterator
PointKernel Apr 12, 2021
565d784
Remove unused variable
PointKernel Apr 12, 2021
bf33f38
Remove device count tests in find_all
PointKernel Apr 12, 2021
2e06410
Update multimap benchmarks in accordance with double hashing
PointKernel Apr 12, 2021
7a6d60f
Fix a bug in multimap count: add end condition for key iterations
PointKernel Apr 12, 2021
0e2f5de
Fix a bug: use adjusted capacity for slots allocation
PointKernel Apr 13, 2021
8659e2a
Fix a bug in per CG cache writing: shfl offset properly
PointKernel Apr 13, 2021
b9e06cf
Use CG memcpy_async to write output
PointKernel Apr 13, 2021
730ad03
Use alignof to get rid of misaligned memory access error
PointKernel Apr 14, 2021
19170bc
Use reinterpret_case instead of atomic loads in multimap count
PointKernel Apr 22, 2021
91b600a
Add stream arguments to multimap bulk functions
PointKernel Apr 22, 2021
6e07991
Add multimap count benchmark + update default benchmarking parameters
PointKernel Apr 22, 2021
44c0fc1
Use is_contiguous_iterator to detect when to use cg::memcpy_async
PointKernel Apr 22, 2021
e540a37
Optimize multimap count: getting rid of cg.any, ballot, popc + using …
PointKernel Apr 23, 2021
d8c7ce1
Minor improvement: end key iteration whenever an empty slot shows up
PointKernel Apr 23, 2021
3c1e01d
Remove one unnessary cg sync in multimap find_all
PointKernel Apr 23, 2021
36f6cff
Use cg::any in multimap insert for simplicity
PointKernel Apr 26, 2021
0aac02a
Fix a bug in multimap constructor: using the adjusted capcacity
PointKernel Apr 26, 2021
7f83783
Use vector load to optimize multimap count and find_all
PointKernel Apr 28, 2021
bceedcd
Update multimap benchmarks: remove uninterested benchmarks, evaluate …
PointKernel May 3, 2021
d84d8b3
Use vector load for 64-bit variables
PointKernel May 3, 2021
349126d
Update benchmark CMake: using c++17
PointKernel May 3, 2021
789ca20
Updates:
PointKernel May 3, 2021
9afd723
Update performance analysis jupyter notebook
PointKernel May 4, 2021
b43bfbd
Benchmark updates: remove insert CG benchmark, add count-varying-matc…
PointKernel May 6, 2021
ac5655f
Customize benchmark name
PointKernel May 6, 2021
a94d2c5
Add in-depth count analysis in notebook
PointKernel May 6, 2021
3cd4336
Minor updates to notebook: change file names
PointKernel May 6, 2021
7e89eff
Per-block shared memory buffer with CG-level parallel retrieval
PointKernel May 6, 2021
6dbf36f
Flush when next iteration won't fit into cache
PointKernel May 7, 2021
07091b8
Add key_generator header file
PointKernel May 10, 2021
57b4f92
Add distribution map: mapping string to enum
PointKernel May 10, 2021
7b0727a
A comprehensive multimap benchmark
PointKernel May 11, 2021
542cb3c
Fix a silly bug in key generator: from begin to end, not the opposite!
PointKernel May 11, 2021
e7bf44b
Update benchmark: solid control of key multiplicities; binomial inste…
PointKernel May 11, 2021
d67b145
Fix sync bug in the very last block in find_all
PointKernel May 12, 2021
e806f26
Fix the last block sync bug in find_all
PointKernel May 12, 2021
0212697
Update benchmarks: revert back to gaussian due to the incredibly high…
PointKernel May 12, 2021
1f88244
Minor cleanups
PointKernel May 13, 2021
26c0eff
Use relaxed atomic add
PointKernel May 13, 2021
e1dc700
Update benchmark: use count to determine output size
PointKernel May 13, 2021
827e8fc
Set geometric sampling range to [0, INT32_MAX] to avoid the left bias…
PointKernel May 13, 2021
8c58ac7
Update notebooks for the comprehensive benchmark
PointKernel May 14, 2021
67a9880
Minor code cleanups: remove debugging code path
PointKernel May 21, 2021
7577d3b
Use template integer block size
PointKernel May 21, 2021
94697b0
Remove view::count functions
PointKernel May 21, 2021
22e6b5e
Cleanups: remove unused stuff
PointKernel May 21, 2021
28fb29d
Remove multimap find functions
PointKernel May 25, 2021
f6c3837
Update multimap example: replace find with count and find_all
PointKernel May 26, 2021
0518095
Remove naive insert global function
PointKernel May 26, 2021
7bdad3b
Update contains to use vector load
PointKernel May 26, 2021
0f90d54
prime.hpp instead of prime.cuh
PointKernel May 26, 2021
0148769
Add probe sequence header: DoubleHashing class
PointKernel May 27, 2021
ea7fe78
Passing ProbeSequence as template parameter instead of CGSize
PointKernel May 27, 2021
14ca3fd
Minor corrections: follow naming convention
PointKernel May 27, 2021
f16c153
Add inner/outer functions for count and retrieve
PointKernel May 31, 2021
08a9d00
Update benchmarks, examples and tests for inner/outer functions
PointKernel May 31, 2021
be4bf73
Add pair_count_inner/outer functions
PointKernel May 31, 2021
e6c98eb
Add linear probing class + next_slot cleanups
PointKernel May 31, 2021
3868b10
Create & use probe_sequence_base class
PointKernel May 31, 2021
414bd33
Renaming: remove _inner suffix
PointKernel Jun 1, 2021
5344bdd
Add static is_vector_load in probe sequence class
PointKernel Jun 1, 2021
e568706
Use is_vector_load & is_outer template booleans to simplify probing m…
PointKernel Jun 1, 2021
cd6566d
Fix bugs: wrong logic in count_outer + use this pointer
PointKernel Jun 2, 2021
a5c83e3
Update multimap unit tests
PointKernel Jun 2, 2021
451b0bb
Pass pair_equal callables to pair_count + update comments
PointKernel Jun 2, 2021
8e3d2ad
Add pair_count unit tests
PointKernel Jun 2, 2021
87e0050
Change the default nvbench repo in CMake
PointKernel Jun 3, 2021
476f8fe
Add trait for is_bitwise_comparable.
jrhemstad Jun 2, 2021
228c153
Update docs.
jrhemstad Jun 2, 2021
cf96639
fix typo.
jrhemstad Jun 2, 2021
c619954
Missing angle bracket.
jrhemstad Jun 2, 2021
fb7660f
Revert dynamic map and static map implementations to their main dev v…
PointKernel Jun 7, 2021
32cb584
Revert dynamic map and static map benchmarks to their main dev version
PointKernel Jun 7, 2021
f65b61f
Revert reduce_by_key to its main dev version
PointKernel Jun 7, 2021
47c2624
Revert static map and dynamic map tests to their main dev version
PointKernel Jun 7, 2021
dce0997
Move is_bitwise_comparable to traits.hpp and update map/multimap acco…
PointKernel Jun 7, 2021
6229de5
Per-warp buffer by default for vector loads
PointKernel Jun 8, 2021
8af4d1d
Use aggregate CAS for 32-bit integers during insertions
PointKernel Jun 8, 2021
b223ec9
Update performance analysis notebook
PointKernel Jun 9, 2021
cdcbbd4
Rename num_items to num_matches
PointKernel Jun 9, 2021
e4ece4f
Create and use view::count functions
PointKernel Jun 9, 2021
4dea589
Refactor global pair_count to use view::pair_count functions
PointKernel Jun 9, 2021
95c2ac9
Cleanups: remove redundant template specializations
PointKernel Jun 9, 2021
e21e037
Refactor retrieve kernel to use view-based retrieve functions
PointKernel Jun 9, 2021
cddc185
Correction: update copyright to 2021
PointKernel Jun 16, 2021
d61cbad
Add TODO comments: use cuda::atomic when it's ready
PointKernel Jun 17, 2021
80bd325
Update docs
PointKernel Jun 21, 2021
68bf47d
Use lightweight stream sync instead of device sync
PointKernel Jun 21, 2021
b817c34
Fix a cuda barrier bug: use CUCO_HAS_CUDA_BARRIER logic
PointKernel Jun 21, 2021
a75b4ff
Remove different signedness warnings
PointKernel Jun 21, 2021
86b76eb
Merge remote-tracking branch 'upstream/dev' into static-multi-map
PointKernel Jul 5, 2021
df8f1ba
Remove redundant codes
PointKernel Jul 6, 2021
ed4ee25
Use uses_vector_load instead of is_vector_load
PointKernel Jul 7, 2021
e978f15
Update copyrights
PointKernel Jul 7, 2021
cec1145
Use is_packable logic to determine uses_vector_load
PointKernel Jul 7, 2021
662a643
Get rid of enable_if in public APIs
PointKernel Jul 8, 2021
b8b523c
Fix a key comparison bug
PointKernel Jul 9, 2021
b4b6c93
Simplify multimap insert_impl
PointKernel Jul 9, 2021
ff49d42
Use bitwise compare instead of naive =
PointKernel Jul 9, 2021
f125ef3
Cleanups: use load_pair_array function
PointKernel Jul 9, 2021
6acf148
Rename flush buffer functions
PointKernel Jul 12, 2021
52f3a3a
Cleanups: remove duplcates by using maybe_unused
PointKernel Jul 13, 2021
9163acd
Replace warp logic with cooperative group
PointKernel Jul 15, 2021
f2262c4
Refactor flush_output_buffer function
PointKernel Jul 15, 2021
ee649eb
Fix the build error related to CUDA 11.0
PointKernel Jul 15, 2021
f23c9e1
Update docs + move uses_vector_load to private
PointKernel Jul 19, 2021
14be940
Add tests for 4B/8B pairs
PointKernel Jul 19, 2021
ace1273
Use vector_width logic
PointKernel Jul 19, 2021
843d83f
Fix empty key/value sentinel bugs
PointKernel Jul 20, 2021
af10fca
Cleanup: move self-defined enum types to the key generator header file
PointKernel Jul 26, 2021
c65a034
Use distinct names instead of SFINAE
PointKernel Jul 26, 2021
7a66714
Move is_bitwise_comparable logic to bitwise_compare header file
PointKernel Jul 26, 2021
f95cc83
Prefetch the dedicated stream
PointKernel Jul 26, 2021
f0943bc
Use counter allocator instead of unified memory
PointKernel Jul 26, 2021
88fa92b
Add insert_if bulk function
PointKernel Jul 27, 2021
b1adea6
Add insert_if unit tests
PointKernel Jul 27, 2021
12dc7a1
Make all immutable functions constant
PointKernel Jul 28, 2021
8cb913b
Add pair_retrieve API in multimap header
PointKernel Jul 29, 2021
ef249a5
Fix a typo
PointKernel Jul 29, 2021
9631ea0
Add pair_retrieve member functions
PointKernel Jul 29, 2021
0d46747
Fix typos
PointKernel Jul 29, 2021
6d2025d
Add a new flush_output_buffer API + minor doc corrections
PointKernel Jul 29, 2021
652cbe1
Add pair_retrieve device functions
PointKernel Jul 30, 2021
f1b94a9
Use zip iterator for pair_retrieve functions
PointKernel Jul 30, 2021
62dc120
Include thrust tuple header
PointKernel Jul 30, 2021
b149744
Dereference zip iterator
PointKernel Jul 30, 2021
438f813
Minor cleanups
PointKernel Aug 4, 2021
4909e79
Fix a bug in pair retrieve: offset for second equals
PointKernel Aug 5, 2021
fe309f5
Add unit tests for pair retrieve function + fix a typo
PointKernel Aug 5, 2021
32e551b
Add unit tests for small test cases
PointKernel Aug 5, 2021
b1e2cd9
Fix a warp cg synchronization bug
PointKernel Aug 5, 2021
ec28dd0
Add stream argument to multimap constructor
PointKernel Aug 9, 2021
04343da
Add optional precomputed hash argument
PointKernel Aug 11, 2021
12c3701
Cleanups: add d_counter as a memeber variable
PointKernel Aug 13, 2021
be58a9a
Cleanups: remove unused template parameters
PointKernel Aug 17, 2021
0735c19
Add stream as allocator argument + cleanups
PointKernel Aug 17, 2021
7476735
Member variable stream: const value instead of const reference
PointKernel Aug 17, 2021
0b4eeeb
Use cudaOccupancyMaxActiveBlocksPerMultiprocessor to determine grid size
PointKernel Aug 23, 2021
d9d92ee
Fix a minor bug: use bitwise_compare to compare against sentinel
PointKernel Aug 23, 2021
e152871
Remove unused key_equal in insert functions
PointKernel Aug 23, 2021
72e95b4
Remove precomputed_hash logic + fix a bug in vectorized linear probing
PointKernel Aug 23, 2021
f4565c0
Use precomputed prime instead of on-the-fly calculation
PointKernel Aug 23, 2021
081e82b
Merge remote-tracking branch 'upstream/dev' into static-multi-map
PointKernel Aug 25, 2021
da10eba
Fix several typos
PointKernel Aug 28, 2021
f249c44
Fix a bug: initialize output_idx before ballot
PointKernel Aug 31, 2021
1fec3cb
Refactor insert_if to take a stencil sequence
PointKernel Sep 3, 2021
e917e3d
Cleanups: get rid of atomic loads + use value_type
PointKernel Sep 3, 2021
5752e34
Refactor count/pair_count APIs
PointKernel Sep 3, 2021
ed21556
Refactor host bulk insert_if API
PointKernel Sep 3, 2021
fa1a8e6
Fix a bug: cuda::atomic taking Scope template parameter
PointKernel Sep 7, 2021
60c28e6
Add multimap move constructor
PointKernel Sep 9, 2021
d741b99
Use default move ctor & assignment operator
PointKernel Sep 10, 2021
62bee53
Update benchmarks: remove unused timer, rename variables and benchmar…
PointKernel Sep 13, 2021
4174391
Update multimap example: add comments, show outer use case
PointKernel Sep 13, 2021
65c32ed
Move probe sequence classes into cuco::detail namespace
PointKernel Sep 13, 2021
c663723
Update probe sequence constructors: taking hash object instead of def…
PointKernel Sep 13, 2021
93dc645
Minor doc updates
PointKernel Sep 13, 2021
ee19136
Move is_bitwise_comparable to traits header + formatting
PointKernel Sep 13, 2021
42532f4
Update docs
PointKernel Sep 13, 2021
ac10e12
Updates: allocated memory using unique pointer + cleanups
PointKernel Sep 15, 2021
27aafc3
Cleanups using util header
PointKernel Sep 15, 2021
08c6cea
Update docs + code cleanups
PointKernel Sep 16, 2021
c2af6fd
Add manual default constructor
PointKernel Sep 16, 2021
3d7a6c1
Default constructed unique pointers
PointKernel Sep 16, 2021
aedc828
Add move assignment for custom deleters
PointKernel Sep 16, 2021
14384d7
Code cleanups: use CUCO_HAS_CG_MEMCPY_ASYNC macro
PointKernel Sep 16, 2021
d8b3036
CMake file cleanups
PointKernel Sep 16, 2021
a6fec80
Add linear probing unit tests
PointKernel Sep 16, 2021
3db6522
Use thrust algorithms instead of naive for loops
PointKernel Sep 16, 2021
be9def1
Minor correction in probe sequence
PointKernel Sep 16, 2021
f7adf96
Fix a bug: retrieve_impl takes key_equal callable
PointKernel Sep 17, 2021
4ec7f69
Minor correction in double hashing: avoid using key + operator
PointKernel Sep 17, 2021
c4799cc
Add tests for custom key and value types
PointKernel Sep 17, 2021
9ec49c3
Update docs
PointKernel Sep 17, 2021
17bdffe
Updates: docs, examples and probe sequence taking map's Scope
PointKernel Sep 17, 2021
c26366a
Add thorough tests for retrieve functions
PointKernel Sep 17, 2021
b1ff422
Update size computation & add related unit tests
PointKernel Sep 20, 2021
0a3e137
Use non-default seed to get rid of the potential secondary clustering…
PointKernel Sep 22, 2021
ed6b253
Include necessary headers
PointKernel Oct 7, 2021
0dd290e
Add and use is_bitwise_comparable_v
PointKernel Oct 8, 2021
f84216f
Update docs
PointKernel Oct 8, 2021
294ffc1
pair_retrieve returns pair of iterators instead of size
PointKernel Oct 8, 2021
5082f8f
Remove default constructor
PointKernel Oct 8, 2021
bf09697
Remove move assignment operator for deleters
PointKernel Oct 8, 2021
da04045
Add docs for flush_output_buffer
PointKernel Oct 13, 2021
927b2b4
Minor cleanups
PointKernel Oct 13, 2021
0be35b7
Doc cleanups
PointKernel Oct 13, 2021
de86036
Test cleanups
PointKernel Oct 15, 2021
99d26ce
Move _impl functions to a new file
PointKernel Oct 15, 2021
0980737
Updates: move _impl functions out of the public header
PointKernel Oct 15, 2021
8c9efe8
Minor cleanup: remove unnecessary include
PointKernel Oct 15, 2021
6835774
Move implementations into detail/static_multimap folder
PointKernel Oct 25, 2021
092eca6
Add static assert for ProbeSequence type check
PointKernel Oct 25, 2021
a513ecb
Cleanups: use flushing_cg instead of warp for retrieve
PointKernel Oct 25, 2021
a4cfcbe
Cleanups: use flushing CG logic for pair_retrieve
PointKernel Oct 25, 2021
772d74c
Forinline device functions
PointKernel Oct 26, 2021
6143741
Change template parameter order: allocator before probe sequence
PointKernel Oct 26, 2021
14621e1
Move linear probing and double hashing to a new public header
PointKernel Oct 26, 2021
7d5b8e8
Update comment
PointKernel Oct 26, 2021
a0427a4
Merge remote-tracking branch 'upstream/dev' into static-multi-map
PointKernel Oct 27, 2021
6fa4850
Avoid public access to probe sequence key, value and scope types
PointKernel Oct 27, 2021
9c80eb6
Cleanups: rename classes, create probe_sequence_base class, move doub…
PointKernel Oct 27, 2021
1b62dc1
Create probe_sequence_impl_base class
PointKernel Oct 27, 2021
f1ff196
Update comments + minor cleanups
PointKernel Oct 27, 2021
cf365df
Merge branch 'static-multi-map' of github.com:PointKernel/cuCollectio…
PointKernel Oct 27, 2021
689a9c3
Minor update
PointKernel Oct 27, 2021
c6c3aee
Move probe_sequence_base to detail namespace
PointKernel Oct 27, 2021
32ff8ad
Set default CG size to 8
PointKernel Oct 29, 2021
cdba480
Revert back to manual grid size calculation
PointKernel Oct 29, 2021
9bf100b
Use get_grid_size for retrieve for better runtime performance
PointKernel Oct 30, 2021
7e17741
Rename multimap benchmarks
PointKernel Oct 31, 2021
ff407f7
Add pair_retrieve benchmark + function renaming
PointKernel Oct 31, 2021
0e80e14
Update performance notebook
PointKernel Oct 31, 2021
f7d286b
Restrict CG type for public device view APIs
PointKernel Nov 1, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 33 additions & 5 deletions benchmarks/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -12,14 +12,18 @@ CPMAddPackage(
"RUN_HAVE_STD_REGEX 0" #
)

if (benchmark_ADDED)
# patch google benchmark target
set_target_properties(benchmark PROPERTIES CXX_STANDARD 14)
endif()
CPMAddPackage(
NAME nvbench
GITHUB_REPOSITORY NVIDIA/nvbench
GIT_TAG main
GIT_SHALLOW TRUE
)

###################################################################################################
# - compiler function -----------------------------------------------------------------------------
### compiler function #############################################################################
###################################################################################################

###################################################################################################
function(ConfigureBench BENCH_NAME BENCH_SRC)
add_executable(${BENCH_NAME} "${BENCH_SRC}")
set_target_properties(${BENCH_NAME} PROPERTIES
Expand All @@ -35,6 +39,22 @@ function(ConfigureBench BENCH_NAME BENCH_SRC)
CUDA::cudart)
endfunction(ConfigureBench)

###################################################################################################
function(ConfigureNVBench BENCH_NAME BENCH_SRC)
add_executable(${BENCH_NAME} "${BENCH_SRC}")
set_target_properties(${BENCH_NAME} PROPERTIES
POSITION_INDEPENDENT_CODE ON
RUNTIME_OUTPUT_DIRECTORY "${CMAKE_BINARY_DIR}/nvbenchmarks")
target_include_directories(${BENCH_NAME} PRIVATE
"${CMAKE_CURRENT_SOURCE_DIR}")
#"${NVBench_SOURCE_DIR}")
target_compile_options(${BENCH_NAME} PRIVATE --expt-extended-lambda --expt-relaxed-constexpr)
target_link_libraries(${BENCH_NAME} PRIVATE
nvbench::main
pthread
cuco)
endfunction(ConfigureNVBench)

###################################################################################################
### test sources ##################################################################################
###################################################################################################
Expand All @@ -47,6 +67,14 @@ ConfigureBench(DYNAMIC_MAP_BENCH "${DYNAMIC_MAP_BENCH_SRC}")
set(STATIC_MAP_BENCH_SRC "${CMAKE_CURRENT_SOURCE_DIR}/hash_table/static_map_bench.cu")
ConfigureBench(STATIC_MAP_BENCH "${STATIC_MAP_BENCH_SRC}")

###################################################################################################
set(STATIC_MULTIMAP_BENCH_SRC "${CMAKE_CURRENT_SOURCE_DIR}/hash_table/static_multimap/static_multimap_bench.cu")
ConfigureNVBench(STATIC_MULTIMAP_BENCH "${STATIC_MULTIMAP_BENCH_SRC}")

###################################################################################################
set(FIND_ALL_BENCH_SRC "${CMAKE_CURRENT_SOURCE_DIR}/hash_table/static_multimap/find_all_bench.cu")
ConfigureNVBench(FIND_ALL_BENCH "${FIND_ALL_BENCH_SRC}")

###################################################################################################
set(RBK_BENCH_SRC "${CMAKE_CURRENT_SOURCE_DIR}/reduce_by_key/reduce_by_key.cu")
ConfigureBench(RBK_BENCH "${RBK_BENCH_SRC}")
596 changes: 596 additions & 0 deletions benchmarks/analysis/notebooks/StaticMultimap.ipynb

Large diffs are not rendered by default.

99 changes: 99 additions & 0 deletions benchmarks/analysis/notebooks/Utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
# Import libraries
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib

# Global parameters
colors = ['b','r','g','m','y','c']
styles = ['o','s','v','^','D',">"]

def plot_single_perf(bm, df, xaxis, unique_labels):
fig = fig = plt.figure(1,figsize=(5, 5))
fig.suptitle(bm)

ax = fig.gca()
ax.set_xlabel(xaxis)
ax.set_ylabel('GPU Time (sec)')

ax.set_xscale('log')
ax.set_xticks(list(df[xaxis]))
ax.get_xaxis().set_major_formatter(matplotlib.ticker.ScalarFormatter())

marker_handles = []

num_style = len(df["Distribution"].unique())

# Iterate over labels and label indices
for lindex, lbl in enumerate(unique_labels):
tmpdf = df.loc[df['Label'] == lbl]

x = tmpdf[xaxis]
perf = tmpdf["GPU Time (sec)"]

# Get style & type index
sid = lindex % num_style
tid = int(lindex / num_style)

if not tid:
ax.plot(x, perf, color=colors[sid])
ax.scatter(x, perf, color=colors[sid], marker=styles[sid])

# Add legend
marker_handles.append(ax.plot([], [], c=colors[sid], marker=styles[sid], \
label=lbl)[0])
else:
ax.plot(x, perf, color=colors[sid], linestyle="--")
ax.scatter(x, perf, color=colors[sid], marker=styles[sid], facecolors='none')

# Add legend
marker_handles.append(ax.plot([], [], c=colors[sid], marker=styles[sid], \
mfc='none', linestyle="--", label=lbl)[0])

leg = plt.legend(handles = marker_handles, loc="upper left", ncol=2, frameon=False)
plt.savefig(bm + '.eps')

def plot_dual_perf(bm, df, xaxis, unique_labels):
fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(15, 5))
fig.suptitle(bm)

marker_handles = []

lax = [ax1, ax2, ax3]

for item in lax:
item.set_xlabel(xaxis)
item.set_ylabel("GPU Time (sec)")

num_style = len(df["Distribution"].unique())

# Iterate over labels and label indices
for lindex, lbl in enumerate(unique_labels):
tmpdf = df.loc[df['Label'] == lbl]

x = tmpdf[xaxis]
perf = tmpdf["GPU Time (sec)"]

# Get style & type index
sid = lindex % num_style
tid = int(lindex / num_style)

# INT32
if not tid:
lax[sid].plot(x, perf, color=colors[sid])
lax[sid].scatter(x, perf, color=colors[sid], marker=styles[sid])

# Add legend
marker_handles.append(lax[sid].plot([], [], c=colors[sid], marker=styles[sid], \
label=lbl)[0])
# INT64
else:

lax[sid].plot(x, perf, color=colors[sid], linestyle="--")
lax[sid].scatter(x, perf, color=colors[sid], marker=styles[sid], facecolors='none')

# Add legend
marker_handles.append(lax[sid].plot([], [], c=colors[sid], marker=styles[sid], \
mfc='none', linestyle="--", label=lbl)[0])

leg = plt.legend(handles = marker_handles, loc="upper left", ncol=2, frameon=False)
plt.savefig(bm + '.eps')
123 changes: 123 additions & 0 deletions benchmarks/hash_table/static_multimap/find_all_bench.cu
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
/*
* Copyright (c) 2021, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

#include <nvbench/nvbench.cuh>

#include <thrust/device_vector.h>
#include <random>

#include "cuco/static_multimap.cuh"

/**
* @brief Generates input keys by a given number of repetitions per key.
*
*/
template <typename Key, typename OutputIt>
static void generate_multikeys(OutputIt output_begin,
OutputIt output_end,
size_t const multiplicity)
{
auto num_keys = std::distance(output_begin, output_end);

for (auto i = 0; i < num_keys; ++i) {
output_begin[i] = (i % (num_keys / multiplicity)) + 1;
}
}

/**
* @brief A benchmark evaluating multi-value retrieval performance by varing number of repetitions
* per key:
* - 100'000'000 keys are inserted
* - Map occupancy is fixed at 0.4
* - Number of repetitions per key: 1, ... , 128, 256
*
*/
template <typename Key, typename Value, nvbench::int32_t CGSize, nvbench::int32_t BufferSize>
std::enable_if_t<(sizeof(Key) == sizeof(Value)), void> nvbench_find_all(
nvbench::state& state,
nvbench::type_list<Key, Value, nvbench::enum_type<CGSize>, nvbench::enum_type<BufferSize>>)
{
std::size_t const num_keys = state.get_int64("NumInputs");
auto const occupancy = state.get_float64("Occupancy");
std::size_t const size = num_keys / occupancy;
std::size_t const multiplicity = state.get_int64("Multiplicity");

state.add_element_count(num_keys, "NumKeys");
state.add_global_memory_writes<Key>(num_keys * 2);

std::vector<Key> h_keys(num_keys);
std::vector<cuco::pair_type<Key, Value>> h_pairs(num_keys);

generate_multikeys<Key>(h_keys.begin(), h_keys.end(), multiplicity);
for (auto i = 0; i < num_keys; ++i) {
Key key = h_keys[i];
Value val = h_keys[i];
h_pairs[i].first = key;
h_pairs[i].second = val;
}

thrust::device_vector<Key> d_keys(h_keys);
thrust::device_vector<cuco::pair_type<Key, Value>> d_pairs(h_pairs);

cuco::static_multimap<Key,
Value,
cuda::thread_scope_device,
cuco::cuda_allocator<char>,
cuco::double_hashing<CGSize,
cuco::detail::MurmurHash3_32<Key>,
cuco::detail::MurmurHash3_32<Key>>>
map{size, -1, -1};
map.insert(d_pairs.begin(), d_pairs.end());

auto const output_size = map.count_outer(d_keys.begin(), d_keys.end());
thrust::device_vector<cuco::pair_type<Key, Value>> d_results(output_size);

state.exec(nvbench::exec_tag::sync, [&](nvbench::launch& launch) {
map.retrieve_outer(d_keys.begin(), d_keys.end(), d_results.data().get(), launch.get_stream());
});
}

template <typename Key, typename Value, nvbench::int32_t CGSize, nvbench::int32_t BufferSize>
std::enable_if_t<(sizeof(Key) != sizeof(Value)), void> nvbench_find_all(
nvbench::state& state,
nvbench::type_list<Key, Value, nvbench::enum_type<CGSize>, nvbench::enum_type<BufferSize>>)
{
state.skip("Key should be the same type as Value.");
}

using key_type = nvbench::type_list<nvbench::int32_t, nvbench::int64_t>;
using value_type = nvbench::type_list<nvbench::int32_t, nvbench::int64_t>;
using cg_size = nvbench::enum_type_list<1, 2, 4, 8, 16, 32>;
using buffer_size = nvbench::enum_type_list<1, 2, 4, 8, 16>;

NVBENCH_BENCH_TYPES(nvbench_find_all,
NVBENCH_TYPE_AXES(key_type, value_type, cg_size, nvbench::enum_type_list<2>))
.set_type_axes_names({"Key", "Value", "CGSize", "BufferSize"})
.set_timeout(100) // Custom timeout: 100 s. Default is 15 s.
.set_max_noise(3) // Custom noise: 3%. By default: 0.5%.
.add_int64_axis("NumInputs", {100'000'000}) // Total number of key/value pairs: 100'000'000
.add_float64_axis("Occupancy", {0.4})
.add_int64_power_of_two_axis("Multiplicity", nvbench::range(0, 8, 1));

NVBENCH_BENCH_TYPES(
nvbench_find_all,
NVBENCH_TYPE_AXES(key_type, value_type, nvbench::enum_type_list<8>, buffer_size))
.set_type_axes_names({"Key", "Value", "CGSize", "BufferSize"})
.set_timeout(100) // Custom timeout: 100 s. Default is 15 s.
.set_max_noise(3) // Custom noise: 3%. By default: 0.5%.
.add_int64_axis("NumInputs", {100'000'000}) // Total number of key/value pairs: 100'000'000
.add_float64_axis("Occupancy", {0.4})
.add_int64_power_of_two_axis("Multiplicity", nvbench::range(0, 8, 1));
Loading