The rocPRIM is a header-only library providing HIP and HC parallel primitives for developing performant GPU-accelerated code on AMD ROCm platform.
- Git
- CMake (3.5.1 or later)
- AMD ROCm platform (1.8.2 or later)
- Including HCC compiler, which must be set as C++ compiler on ROCm platform.
Optional:
- GTest
- Required only for tests. Building tests is enabled by default.
- It will be automatically downloaded and built by cmake script.
- Google Benchmark
- Required only for benchmarks. Building benchmarks is off by default.
- It will be automatically downloaded and built by cmake script.
git clone https://github.com/ROCmSoftwarePlatform/rocPRIM.git
# Go to rocPRIM directory, create and go to the build directory.
cd rocPRIM; mkdir build; cd build
# Configure rocPRIM, setup options for your system.
# Build options:
# BUILD_TEST - on by default,
# BUILD_BENCHMARK - off by default.
#
# ! IMPORTANT !
# On ROCm platform set C++ compiler to HCC. You can do it by adding 'CXX=<path-to-hcc>'
# before 'cmake' or setting cmake option 'CMAKE_CXX_COMPILER' to path to the HCC compiler.
#
[CXX=hcc] cmake -DBUILD_BENCHMARK=ON ../. # or cmake-gui ../.
# Build
make -j4
# Optionally, run tests if they're enabled.
ctest --output-on-failure
# Install
[sudo] make install
rocPRIM provides two separate APIs that work on ROCm platform: HC and HIP API. After including
<rocprim/rocprim.hpp>
header the default API is HIP. In order to switch to HC API user has to
define ROCPRIM_HC_API
before the #include
statement. Alternatively, user can include
<rocprim/rocprim_hc.hpp>
or <rocprim/rocprim_hip.hpp>
, in this case ROCPRIM_HIP_API
or
ROCPRIM_HC_API
should not be defined.
#include <rocprim/rocprim.hpp> // defaults to HIP API, unless ROCPRIM_HC_API defined before
#include <rocprim/rocprim_hip.hpp> // HIP API
#include <rocprim/rocprim_hc.hpp> // HC API
Recommended way of including rocPRIM or hipCUB into a CMake project is by using their package
configuration files. hipCUB package name is hipcub
, rocPRIM package name is rocprim
.
# "/opt/rocm" - default install prefix
find_package(rocprim REQUIRED CONFIG PATHS "/opt/rocm/rocprim")
find_package(hipcub REQUIRED CONFIG PATHS "/opt/rocm/hipcub")
...
# Includes only rocPRIM headers, HC/HIP libraries have
# to be linked manually by user
target_link_libraries(<your_target> roc::rocprim)
# Includes rocPRIM headers and required HC dependencies
target_link_libraries(<your_target> roc::rocprim_hc)
# Includes rocPRIM headers and required HIP dependencies
target_link_libraries(<your_target> roc::rocprim_hip)
# On ROCm: includes hipCUB headers and roc::rocprim_hip target
# On CUDA: includes only hipCUB headers, user has to include CUB directory
target_link_libraries(<your_target> hip::hipcub)
# Go to rocPRIM build directory
cd rocPRIM; cd build
# To run all tests
ctest
# To run unit tests for hipCUB
./test/hipcub/<unit-test-name>
# To run unit tests for rocPRIM
./test/rocprim/<unit-test-name>
# Go to rocPRIM build directory
cd rocPRIM; cd build
# To run benchmark for warp functions:
# Further option can be found using --help
# [] Fields are optional
./benchmark/benchmark_hc_warp_<function_name> [--size <size>] [--trials <trials>]
./benchmark/benchmark_hip_warp_<function_name> [--size <size>] [--trials <trials>]
# To run benchmark for block functions:
# Further option can be found using --help
# [] Fields are optional
./benchmark/benchmark_hc_block_<function_name> [--size <size>] [--trials <trials>]
./benchmark/benchmark_hip_block_<function_name> [--size <size>] [--trials <trials>]
# To run benchmark for device functions:
# Further option can be found using --help
# [] Fields are optional
./benchmark/benchmark_hc_device_<function_name> [--size <size>] [--trials <trials>]
./benchmark/benchmark_hip_device_<function_name> [--size <size>] [--trials <trials>]
Most of device-wide primitives provided by rocPRIM can be tuned for different AMD device, different types or different operations using compile-time configuration structures passed to them as a template parameter. Main "knobs" are usually size of the block and number of items processed by a single thread.
rocPRIM has built-in default configurations for each of its primitives. In order to use
included configurations user should define macro ROCPRIM_TARGET_ARCH
to 803
if algorithms
should be optimized for gfx803 GCN version, or to 900
for gfx900.
# go to rocPRIM doc directory
cd rocPRIM; cd doc
# run doxygen
doxygen Doxyfile
# open html/index.html
Bugs and feature requests can be reported through the issue tracker.
Contributions of any kind are most welcome! More details are found at CONTRIBUTING and LICENSE.