-
Notifications
You must be signed in to change notification settings - Fork 102
(Stochastic) Laplace Heaviside (s)LapH spectroscopy
A WIP progress branch of an Nc agnostic stochastic Laplace Heaviside method can be found here https://github.com/lattice/quda/tree/feature/Nc_ag_slaph
In this branch, interface functions to the Nc=3 CHROMA_LAPH library can be found here https://github.com/lattice/quda/blob/feature/Nc_ag_slaph/lib/interface/slaph_interface.cpp
These above interface functions will form the nucleus of all the Nc agnostic Perambulator functions, to be completed at a later date. The 3D eigensolver, necessary for all (s)LapH computations in any Nc, is found here. https://github.com/lattice/quda/blob/feature/Nc_ag_slaph/lib/eig_lanczos3d.cpp It is complete and performant on HPC architecture. We give here the methodology of its structure, and some usage instruction.
We implemented some rudimentary, but performant 3D BLAS routines using the 4D data structures of QUDA. Specifically,
- 3D Copy engines that can extract a 3D slice from a 4D vector into a 3D vector, and also extract the 3D data from a 3D vector and insert it into a 4D vector. This is non-trivial procedure (i.e., more complicated that pointer addressing) due to the checkerboard nature of the data arrays.
- 3D (c)axpby routines. These take T length arrays of data for (c)a and b coefficients and apply the (c)axpby scaling to each of the 3D sunspaces of a 4D vector.
The 3D matrix-vector Laplace operator is a variant of native 4D Laplace operator. In the 3D case, we omit all temporal neighbour calculations, and omit any temporal communications in that direction. The operator is therefore block diagonal in T. This means we can apply the operator to a 4D vector, and the operator will act on all 4D subspaces independently. However, different 3D subspaces will converge at different rates, so we may not always need to act on the nth vector of a 3D subspace at the nth step of the algorithm. As such, the 3D copy routines construct a 4D vector with 'inactive' 3D subspaces zeroed out. As the algorithm progresses to the (n+1)th step, more and more 3D subspaces become active. Eventually, the (n_kr)th eigenvector is reached and we must restart.
At restart, there are T rotation matrices to compute. This is done by eigensolving the T arrow matrices sequentially. Subsequent rotations of the 3D Krylov spaces is also done sequentially. Copy engines construct 3D Krylov spaces and QUDA's powerful multi-BLAS engines compute the rotation. The rotated spaces are copied back into the 4D array, and the restart process is repeated. For both the Lanczos step and restart procedure, a 3D subspace immediately becomes 'inactive' once it has converged.
All the required utilities such as block orthonormalisation, reordering, guess construction, etc, have 3D counterparts, which the user can study.
The 3D BLAS routines are correct, but experimental in the native 4D environment of QUDA. As a consequence, they do not "play nice" with the tuning routines. Initial tuning runs will give incorrect data, but subsequent runs are fine. We therefore insist that users run the initial configuration in an ensemble twice, and ensure correctness before taking data.