-
Notifications
You must be signed in to change notification settings - Fork 101
OpenQxD with QUDA
These instructions are intended to be a quick start guide to getting openQxD running with GPUs using the QUDA library.
These instructions assume you are using the recommended branches of QUDA and openQxD:
-
feature/openqxd
, see https://github.com/lattice/quda, (TODO: merge intodevelop
, see pull-reuqest) -
feature/quda/main-thesis-release
in case of openQxD, see https://gitlab.com/rcstar/openQxD-devel, (TODO: merge intomaster
, see pull-request)
First clone QUDA into a subdirectory src/quda
;
git clone -b feature/openqxd https://github.com/chaoos/quda.git src/quda
For compilation, several compile time flags have to be set to enable openQxD interface:
QUDA_INTERFACE_OPENQCD=ON # enables openQxD interface
QUDA_INTERFACE_MILC=OFF
QUDA_INTERFACE_QDP=OFF
QUDA_INTERFACE_BQCD=OFF
QUDA_INTERFACE_CPS=OFF
QUDA_INTERFACE_QDPJIT=OFF
QUDA_INTERFACE_TIFR=OFF
QUDA_DOWNLOAD_USQCD=OFF
QUDA_QIO=OFF
QUDA_QMP=OFF
QUDA_MPI=ON # enable MPI
We want to use all precisions and reconstruction types:
QUDA_PRECISION=14
QUDA_RECONSTRUCT=7
As well as the Wilson- and Clover-Dirac operators:
QUDA_DIRAC_DEFAULT_OFF=ON # disables ALL Dirac operators
QUDA_DIRAC_WILSON=ON # enables Wilson-Dirac operators
QUDA_DIRAC_CLOVER=ON # enables Wilson-clover operators
For the compilers, we choose different ones for difference target machines:
-
CMAKE_CXX_COMPILER
: Either g++ version 11, 12, or Clang version 14 -
CMAKE_C_COMPILER
: usually gcc version 11 or higher MPI_CXX_SKIP_MPICXX=ON
-
CMAKE_CUDA_COMPILER
: nvcc version 11 or 12 or higher -
CUDAToolkit_BIN_DIR
: Set to CUDA binary directory (for example/usr/local/cuda/bin
) -
CUDAToolkit_INCLUDE_DIR
: Set to the CUDA include diractory (for example/usr/local/cuda/include
) -
CMAKE_CUDA_COMPILER_LAUNCHER=ccache
ifccache
is available -
CMAKE_CXX_COMPILER_LAUNCHER=ccache
ifccache
is available
Finally, the architecture and the build type:
-
QUDA_GPU_ARCH
: target architecture, see https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/, https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#gpu-feature-list, https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#virtual-architecture-feature-list for more information -
QUDA_GPU_ARCH_SUFFIX
: real or virtual, see links above -
CMAKE_BUILD_TYPE=STRICT
orDEVEL
,RELEASE
,STRICT
,DEBUG
,HOSTDEBUG
,SANITIZE
, see https://github.com/lattice/quda/wiki/QUDA-Build-With-CMake#reducing-qudas-build-time
For the remaining compiling options, we refer to Building QUDA using CMake
Clone openQxD into a subdirectory src/openqxd
;
git clone -b feature/quda/main-thesis-release https://gitlab.com/rcstar/openQxD-devel.git src/openqxd
Set the required environment variables before compiling (see openQ*D code: a versatile tool for QCD+QED simulations)
export GCC=gcc
export CC=mpicc
export CXX=mpicxx
export MPI_HOME="/usr/lib/x86_64-linux-gnu/openmpi/" # for example
export MPI_INCLUDE="${MPI_HOME}/include"
In the Makefile of the utility you plan to build, make sure to enable QUDA offloading with (see openqxd:extras/main/lowrnk/Makefile
as an example)
USE_QUDA ?= yes
or while compiling
make USE_QUDA=yes
This enables building the required modules in openQxD and linking to QUDA. Check if linking was done correctly with
$ env -i ldd <binary>
[...]
libquda.so => /path/to/libquda.so (0x00007f73af092000)
[...]
Choose the number of ranks as the number of GPUs in openqxd:include/global.h
.
Running a compiled binary behaves the same as before.
On a local machine:
mpirun -np <N> <binary> ... # on regular linux
Or on a cluster like CSCS:
srun ...
sbatch ...
Make sure to have nsys
installed (e.g. yoshi.ethz.ch has it installed). Then run for example
mpirun -np 2 nsys profile -o profiler%q{OMPI_COMM_WORLD_RANK} ./check3 -i check.in
This will create two files profiler0.nsys-rep
and profiler1.nsys-rep
. Download them to your
local laptop, and install Nsight Systems 2023.
Note that you need to register at Nvidia in order to download the program. In order to
obtain named regions, run
# obtain named regions
export NSYS_NVTX_PROFILER_REGISTER_ONLY=0
mpirun -np 2 nsys profile --sample=none --trace=cuda,nvtx,mpi -o profiler_nvtx%q{OMPI_COMM_WORLD_RANK} ./check3 -i check.in