forked from lattice/quda
-
Notifications
You must be signed in to change notification settings - Fork 0
QUDA is a library for performing calculations in lattice QCD on GPUs.
License
metivett/quda
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Release Notes for QUDA v0.5.1 xx May 2013 ----------------------------- Overview: QUDA is a library for performing calculations in lattice QCD on graphics processing units (GPUs), leveraging NVIDIA's CUDA platform. The current release includes optimized Dirac operators and solvers for the following fermion actions: * Wilson * Clover-improved Wilson * Twisted mass (including non-degenerate pairs) * Improved staggered (asqtad or HISQ) * Domain wall Implementations of CG, multi-shift CG, BiCGstab, and DD-preconditioned GCR are provided, including robust mixed-precision variants supporting combinations of double, single, and half (16-bit "block floating point") precision. The library also includes routines for HISQ link fattening and force terms for the HISQ fermion action and one-loop improved Symanzik gauge action. Use of many GPUs in parallel is supported throughout, with communication handled by QMP or MPI. Software Compatibility: The library has been tested under Linux (CentOS 5.8 and Ubuntu 12.04) using release 4.0, 4.1, 4.2, and 5.0 of the CUDA toolkit. CUDA 3.x and earlier are not supported. The library also works under Mac OS X 10.6.8 ("Snow Leopard") and 10.7.3 ("Lion") on recent 64-bit Intel-based Macs. See also "Known Issues" below. Hardware Compatibility: For a list of supported devices, see http://developer.nvidia.com/cuda-gpus Before building the library, you should determine the "compute capability" of your card, either from NVIDIA's documentation or by running the deviceQuery example in the CUDA SDK, and pass the appropriate value to QUDA's configure script. For example, the Tesla C2075 is listed on the above website as having compute capability 2.0, and so to configure the library for this card, you'd run "configure --enable-gpu-arch=sm_20 [other options]" before typing "make". As of QUDA 0.4.0, only devices of compute capability 1.1 or greater are supported. See also "Known Issues" below. Installation: Installing the library involves running "configure" followed by "make". See "./configure --help" for a list of configure options. At a minimum, you'll probably want to set the GPU architecture; see "Hardware Compatibility" above. Enabling multi-GPU support requires passing the --enable-multi-gpu flag to configure, as well as --with-mpi=<PATH> and optionally --with-qmp=<PATH>. If the latter is given, QUDA will use QMP for communications; otherwise, MPI will be called directly. By default, it is assumed that the MPI compiler wrappers are <MPI_PATH>/bin/mpicc and <MPI_PATH>/bin/mpicxx for C and C++, respectively. These choices may be overriden by setting the CC and CXX variables on the command line as follows: ./configure --enable-multi-gpu --with-mpi=<MPI_PATH> \ [--with-qmp=<QMP_PATH>] [OTHER_OPTIONS] CC=my_mpicc CXX=my_mpicxx Finally, with some MPI implementations, executables compiled against MPI will not run without "mpirun". This has the side effect of causing the configure script to believe that the compiler is failing to produce a valid executable. To skip these checks, one can trick configure into thinking that it's cross-compiling by setting the --build=none and --host=<HOST> flags. For the latter, "--host=x86_64-linux-gnu" should work on a 64-bit linux system. By default only the QDP and MILC interfaces are enabled. For interfacing support with QDPJIT, BQCD or CPS; this should be enabled at configure time with the appropriate flag, e.g., --enable-bqcd-interface. To keep compilation time to a minimum it is recommended to only enable those interfaces that are used by a given application. The QDP and MILC interfaces can be disabled with the, e.g., --disable-milc-interface flag. If Fortran interface support is desired, the F90 environment variable should be set when configure is invoked, and "make fortran" must be run explicitly, since the Fortran interface modules are not built by default. As examples, the scripts "configure.milc.titan" and "configure.chroma.titan" are provided. These configure QUDA for expected use with MILC and Chroma, respectively, on Titan (the Tesla K20X-powered Cray XK7 supercomputer at the Oak Ridge Leadership Computing Facility). Throughout the library, auto-tuning is used to select optimal launch parameters for most performance-critical kernels. This tuning process takes some time and will generally slow things down the first time a given kernel is called during a run. To avoid this one-time overhead in subsequent runs (using the same action, solver, lattice volume, etc.), the optimal parameters are cached to disk. For this to work, the QUDA_RESOURCE_PATH environment variable must be set, pointing to a writeable directory. Note that since the tuned parameters are hardware-specific, this "resource directory" should not be shared between jobs running on different systems (e.g., two clusters with different GPUs installed). Attempting to use parameters tuned for one card on a different card may lead to unexpected errors. Using the Library: Include the header file include/quda.h in your application, link against lib/libquda.a, and study tests/invert_test.cpp (for Wilson, clover, twisted-mass, or domain wall fermions) or tests/staggered_invert_test.cpp (for asqtad/HISQ fermions) for examples of the solver interface. The various solver options are enumerated in include/enum_quda.h. Known Issues: * Issues affecting devices with compute capability 1.1, 1.2, or 1.3 (i.e., those predating the "Fermi" architecture): 1. Building for these targets results in a large number of warnings that the compiler "Cannot tell what pointer points to, assuming global memory space." The warnings are annoying but harmless. 2. Computation of the Fermilab heavy-quark residual (and its use as a stopping condition in the solvers) is not supported on these devices. Selecting it will result in a run-time error. 3. It is anticipated that QUDA 0.6.0 will drop support for pre-Fermi cards completely. QUDA 0.5.x will continue to be maintained as a separate branch (with bug fixes but minimal new features) for those needing this support. * When used with drivers predating the CUDA 5.0 production release, Fermi-based GeForce cards suffer from occasional hangs when reading from double-precision textures, recoverable only with a soft reset. The recommended solution is to upgrade to the latest driver. Alternatively, such texture reads can be disabled (at the expense of some performance) by passing the --disable-fermi-double-tex flag to configure. * For compatibility with CUDA, on 32-bit platforms the library is compiled with the GCC option -malign-double. This differs from the GCC default and may affect the alignment of various structures, notably those of type QudaGaugeParam and QudaInvertParam, defined in quda.h. Therefore, any code to be linked against QUDA should also be compiled with this option. * The auto-tuner reports "0 Gflop/s" and "0 GB/s" for several of the Dslash kernels (visible if the verbosity is set to at least QUDA_SUMMARIZE), rather than the correct values. This does not affect the tuning process or actual performance. * Attempts to compile with GCC 4.7.x will fail due to bugs in the compiler. Getting Help: Please visit http://lattice.github.com/quda for contact information. Bug reports are especially welcome. Acknowledging QUDA: If you find this software useful in your work, please cite: M. A. Clark, R. Babich, K. Barros, R. Brower, and C. Rebbi, "Solving Lattice QCD systems of equations using mixed precision solvers on GPUs," Comput. Phys. Commun. 181, 1517 (2010) [arXiv:0911.3191 [hep-lat]]. When taking advantage of multi-GPU support, please also cite: R. Babich, M. A. Clark, B. Joo, G. Shi, R. C. Brower, and S. Gottlieb, "Scaling lattice QCD beyond 100 GPUs," International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2011 [arXiv:1109.2935 [hep-lat]]. Several other papers that might be of interest are listed at http://lattice.github.com/quda . Authors: Ronald Babich (NVIDIA) Kipton Barros (Los Alamos National Laboratory) Richard Brower (Boston University) Mike Clark (NVIDIA) Justin Foley (University of Utah) Joel Giedt (Rensselaer Polytechnic Institute) Steven Gottlieb (Indiana University) Balint Joo (Jefferson Laboratory) Claudio Rebbi (Boston University) Guochun Shi (NCSA) Alexei Strelchenko (Fermi National Accelerator Laboratory) Portions of this software were developed at the Innovative Systems Lab, National Center for Supercomputing Applications http://www.ncsa.uiuc.edu/AboutUs/Directorates/ISL.html Development was supported in part by the U.S. Department of Energy under grants DE-FC02-06ER41440, DE-FC02-06ER41449, and DE-AC05-06OR23177; the National Science Foundation under grants DGE-0221680, PHY-0427646, PHY-0835713, OCI-0946441, and OCI-1060067; as well as the PRACE project funded in part by the EUs 7th Framework Programme (FP7/2007-2013) under grants RI-211528 and FP7-261557. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the Department of Energy, the National Science Foundation, or the PRACE project.
About
QUDA is a library for performing calculations in lattice QCD on GPUs.
Resources
License
Stars
Watchers
Forks
Packages 0
No packages published
Languages
- C 54.0%
- C++ 38.7%
- Python 3.8%
- Objective-C 2.8%
- Other 0.7%