This folder demonstrates cuBLASLt library API usage.
-
Sample wrapper executing double precision gemm with a predefined algorithm using cublasLtMatmul, nearly a drop-in replacement for cublasDgemm, with addition of the workspace to support split-K algorithms.
-
Sample wrapper executing mixed precision gemm with cublasLtMatmul, nearly a drop-in replacement for cublasGemmEx, with addition of the workspace to support split-K algorithms.
-
Use cublasLtMatmul to perform tensor-op Igemm with memory order transforms on all buffers.
-
Use cublasLtMatmul to perform tensor-op Cgemm using planar complex memory layout and half-precision inputs.
-
Sample wrapper executing single precision gemm with cublasLtMatmul, nearly a drop-in replacement for cublasSgemm, with addition of the workspace to support split-K algorithms.
-
Sample wrapper running through multiple algo and config attributes combination for single precision gemm using cublasLt low-level API.
-
Sample wrapper executing single precision gemm algorithm auto tuning by querying cublasLt heuristics for best algorithms, iterate over the results and pick the algorithm that have the best performance for the given problem.
SM 5.0 SM 5.2 SM 5.3 SM 6.0 SM 6.1 SM 6.2 SM 7.0 SM 7.2 SM 7.5 SM 8.0
Linux & Windows
x86_64
- Windows 10
- Ubuntu 18.04
- A Linux/Windows system with NVIDIA driver of version 450.41 and above.
- CUDA 11.0 toolkit.
- CMake 3.10 and above
- Compiler with C++ 11 or above capabilities
git clone https://github.com/NVIDIA/CUDALibrarySamples.git
cd CUDALibrarySamples/cuBLASLt
mkdir build
cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j
On Windows, instead of running the last build step, open the Visual Studio Solution that was created and build.