MILC batched deflation #1529

leonhostetler · 2024-12-22T15:55:48Z

This pull request implements batched deflation for MILC. Most of the work was limited to the QUDA/MILC interface, however, there is one change (the first one listed below) that goes beyond that scope, so please check that one in particular to ensure that my solution is okay.

Changed errorQuda() in constructDeflationSpace() to warningQuda(). See below for more description
Added a function qudaCleanUpDeflationSpace(). See below for more description
Added some basic stuff to invertQuda() to preserve the deflation space. The preservation of the eigenvectors and eigenvalues is then controlled from the MILC side by the appropriate parameters in the QudaEigParam struct.
Added ability to control tol_restart and cuda_prec_eigensolver from the MILC side
Changes to invertQuda() were copied to invertQudaMsrc().

Note there is a companion MILC pull request at milc-qcd/milc_qcd#76

Note that this implementation only performs deflation for even parity solves. This seems to be fine when the UML solver is selected for MILC. With UML, the odd parity solution is reconstructed from the even parity solution and then polished with a few CG iterations. Usually this is a small number and might not benefit from deflation anyway. On the other hand, if the CG or CGZ solvers are preferred then odd parity deflation should also be implemented in the future.

More details:

I changed errorQuda() in constructDeflationSpace() to warningQuda(). Here's why:

When loading eigenvectors from file, loadFromFile() -> computeEvals() is called, which extends the deflation space by the amount of the batch size:

if (size + batch_size > static_cast<int>(evecs.size())) resize(evecs, size + batch_size, QUDA_NULL_FIELD_CREATE);

After the first deflated solve, the deflation space is preserved, but it is now of size evecs.size() + batch_size.

This is fine for the first deflated solve, but on the second deflated solve, when constructDeflationSpace() is called and the preserved deflation space is attempted to be loaded, the following check fails:

if ((!space->svd && param.eig_param.n_conv != (int)space->evecs.size())
    || (space->svd && 2 * param.eig_param.n_conv != (int)space->evecs.size()))
  errorQuda("Preserved deflation space size %lu does not match expected %d", space->evecs.size(),
	    param.eig_param.n_conv);

It expects

param.eig_param.n_conv == space->evecs.size()

However, it is now actually

param.eig_param.n_conv + param.eig_param.compute_evals_batch_size == space->evecs.size()

Replacing the errorQuda() with warningQuda() allows it to run, and it runs fine. However, if there are cases where we really need the warning to be an error, then this is probably not the right way to fix it.

Added a function qudaCleanUpDeflationSpace(), which can be called from MILC to clean up the deflation space. An alternative approach would be to set preserve_deflation_space to false on the last solve. However, this would limit how much we could amortize the cost of the eigenvector loading. The parameter input file for a large MILC calculation is chunked into "readin sets". MILC reads one of these readin sets, performs all calculations therein, and then reads the next readin set. So during a job, MILC does not know when it is doing the last solve. At best, it knows it's the last solve in the current readin set. So it seemed to me the best way to ensure that the deflation space is preserved over all readin sets is to add a cleanup function that can be called from MILC's finalize_quda() function in milc_qcd/generic/milc_to_quda_utilities.c.

leonhostetler and others added 3 commits December 20, 2024 13:55

MILC deflation for even-parity single right-hand side solves

d6792c1

MILC deflation for even-parity Msrc solves

119a9f2

Merge branch 'lattice:develop' into leonhostetler/milc_batched_deflation

4bfdada

leonhostetler requested a review from a team as a code owner December 22, 2024 15:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MILC batched deflation #1529

MILC batched deflation #1529

leonhostetler commented Dec 22, 2024

MILC batched deflation #1529

Are you sure you want to change the base?

MILC batched deflation #1529

Conversation

leonhostetler commented Dec 22, 2024