Feature/host clover #480

maddyscientist · 2016-06-10T04:41:18Z

(Updated description of pull to reflect all changes made since pull was first opened.)

This pull adds support for the following

Reference clover-dslash and twisted-clover application operators. This provides us with a means to check the correction of the QUDA implementation of the wilson-clover and twisted-clover dslash operators.
All --test variants are now properly supported for dslash_test for much improved testing.
Significant clean up of loadCloverQuda, merging the code paths between clover and twisted-clover.
loadCloverQuda now supports explicitly requesting the clover field or its inverse be computed explicitly, rather than relying on null pointers to trigger this (through the QudaInvertParam::compute_clover and QudaInvertParam::compute_clover_inverse parameters, respectively).
loadCloverQuda now supports copying back the computed clover field and clover-inverse field (through the QudaInvertParam::return_clover and QudaInvertParam::return_clover_inverse parameters, respectively); this functionality is used by both dslash_test, invert_test and multigrid_invert_test to correctly apply the clover dslash operators on the host for true residual comparison.
Multi-shift solver now supported for twisted-mass and twisted-clover fermions in invert_test.
Added new command-line parameters to unit tests --clover-coeff and --compute-clover which controls whether a random clover field is used or the actual clover field is computed and inverted when running dslash_test and invert_test.
Improved performance of clover::FloatNOrder accessors through vectorized memory accesses.
Fixes an autotuning bug in the clover inversion, triggered when the input and output fields alias.
Fixes a null pointer dereference bug in dynamic clover with twisted-clover fermions.

This doesn't do the reference clover term construction, but this is a separate issue.

…lson-clover unit testing in dslash_test and invert_test. Some minor anciliiary clean up of reference blas as the same time.

…r testing (part 3 of #19).

AlexVaq · 2016-06-10T06:04:05Z

Sure, I'll.

… make clover verification non trivial

Includes DYNAMIC_CLOVER support

AlexVaq · 2016-06-12T10:52:35Z

I think this is ready, I have just a comment: for DYNAMIC_CLOVER some test fail. The asymmetric kernels perform better than the symmetric ones (I suppose the less the inverse clover is applied, the better). Although I'm not very worried (things get better with increasing precision, so the computation must be ok) I don't understand very well why it happens. Here is the list of failed tests (only tests involving the clover inverse are shown):

Test.........Single........Double........Single^+.....Double^+.........................^+ means dagger
0 Asym.....1e-1..........1e-10...........1e-1............1e-10
0 Sym......Pass..........1e-11...........1e-1............1e-10
1 Asym....Pass..........Pass............Pass...........Pass
1 Sym......Pass..........Pass............1e-2...........1e-11
3 Asym....Pass..........1e-11...........Pass...........Pass
3 Sym......1e-1..........1e-10...........1e-1............1e-10

AlexVaq · 2016-06-13T08:50:22Z

Why is the default check failing? I don't understand, I try to see something in the logs but I can't find anything (apart from a problem with the login, that also appears in successful compilations).

maddyscientist · 2016-06-15T00:49:44Z

It doesn't sound correct that dynamic clover inversion should fail, especially the single precision fails at O(1e-1) deviation. Taking a look at this now.

How well is invert_test working for this?

AlexVaq · 2016-06-15T05:58:40Z

I didn't see any problems in invert_test, but my setup might not have been evil enough.

… same: source must be preserved and restored since tuning destroys the input.

maddyscientist · 2016-06-15T07:30:01Z

I've fixed one bug that could be affecting both this and #474: when using the symmetric preconditioning, the input and output fields to the clover inverse alias, in this case we must save and restore the input clover field else the autotuning will destroy the result. This could be affecting #474 since the autotuner isn't enabled for the first solve in general since autotuning is switched on by the call to invertQuda and previous calls do not have it enabled. So it would make sense that the second call breaks the inversion, though it doesn't explain why it would seemingly converge until the reliable update.

AlexVaq · 2016-06-15T07:33:55Z

Oh-ho, that looks really twisted. Thanks for fixing it. Do you have results for dslash_test? Is the table I sent before noticeably changing?

maddyscientist · 2016-06-15T07:41:58Z

It doesn't seem to affect the large deviations, it just fixes getting Nans if tuning is enabled (and the tuning is explicitly switched on prior to the loadCloverQuda call. Still looking at this.

maddyscientist · 2016-06-15T08:01:47Z

Ok, having thought about this for a wee while, I think my fix above probably does (partially) fix #474. While it appears that the reliable update breaks the convergence, I suspect it was always junk (CG is just outputting the iterated residual): if you do not use an initial guess, then the initial residual will always be just the norm of the source vector, and actual residual isn't computed until the reliable update at which point it explodes.

AlexVaq · 2016-06-15T08:44:16Z

Yes, I see the large deviations are not fixed. I'm a bit puzzled about this...

…rmance of all algorithms that use this (clover inversion sees a 1.5x speedup). Added missing support for clover norm field save/restore in tuning.

…ntroduced in clover::FloatNOrder in last commit

…rix that results in signficant errors in cholesky decomposition leading to failure in twisted-clover test.

maddyscientist · 2016-06-15T23:15:51Z

I've tracked down the source of the failures: with norm=0.2 and diag=1.0, the resulting clover matrices can be very poorly conditioned, e.g., O(1e12) condition number. The resulting inverse matrices have wildly varying coefficients, with the result that the difference of order of operation between the reference CPU and GPU code paths lead to different answers. This is simply remedied by setting norm=0.1 to reduce the condition number of the clover matrices,

maddyscientist · 2016-06-15T23:33:44Z

@AlexVaq Why do you need both cloverPrecise and cloverInvPrecise fields? It appears that for the former you only define the direct piece, and the latter only defines the inverse piece. I realize that cloverInvPrecise includes the twisted part as well, but any reason why you just can't store this inverse in the same container as cloverPrecise? Perhaps I'm missing something, but it appears you never need the inverse of the regular clover term, only when it includes the twisted part.

Along a similar line, why do you call computeClover twice, instead of copying cloverPrecise to cloverInvPrecise?

BTW, with my latest changes to the clover::FloatNOrder accessor, the performance of the high level clover inversion should be closer to your macro version.

…compute trace log: this makes the result reproducible after tuning has completed. Trace-log computation and twist parameters are now templated to improve code generation.

AlexVaq · 2016-06-16T08:21:00Z

@maddyscientist ok, I see. I chose norm = 0.2 as a big departure from norm = 1.0 that was being used in the clover case, and thought it worked reasonably well, but it seems it was not "good enough" (I take that you read the comment I put in dslash_test.cpp about the matter).

Regarding cloverInvPrecise, it's useless with dynamic clover, but without the inversion on the fly we have to store two different clover fields. I was not aware that you could store two distinct clover fields within the same container (one would be the standard clover and the other would be the inverse of [clover^2 + mu^2] ). If it's possible, then definitely I should do it.

The two calls to computeClover is a reminiscence of very old code that, for some reason still unknown to me, was giving wrong results when copying the clover term, and was giving right results when recomputing it. Since this was all startup time and didn't add a significant overhead to the overall computation, I'd just forgotten it, but I'm glad now it works as it should.

If the accessor improved performance and the differences are not very significant, then we can get rid the fugly macro, but we should test it before we make changes difficult to revert.

maddyscientist · 2016-06-16T10:02:42Z

Regarding whether we need two fields or not for twisted clover: it looks like the mu term only appears in the inverse explicitly and there isn't any difference between the direct term with/without twist. Is that correct? That was the motivation for my observation that we only need one field.

I haven't tested that only one clover computation is needed with a copy, but I will try it.

The fact that norm=1.0 works for regular clover is because I wasn't inverting the clover term, so your test was stronger.

AlexVaq · 2016-06-16T10:52:47Z

it looks like the mu term only appears in the inverse explicitly and
there isn't any difference between the direct term with/without twist. Is
that correct?

Yes, it is. If you can store the data for C and [C^2 + mu^2] in
cloverPrecise, then it can be done. Since sometimes I don't understand all
the possibilities of the QUDA classes (the library is growing very
fast...), I took the easy way, but it might not be the most efficient
one...

The fact that norm=1.0 works for regular clover is because I wasn't
inverting the clover term, so your test was stronger.

Yes, I understood that. That's why I chose norm=0.2, but it was an ill
choice.

Happy to know everything is all right now. This library is becoming
extremely useful, I think I didn't use anything else for the last four
years.

…ermions. Fixed bug affecting dyanmic-clover caused by dereferencing a null pointer.

maddyscientist · 2016-06-17T07:16:32Z

@AlexVaq Can you take a look at my last commit? I've merged the two clover fields for twisted clover which cleans up the code nicely.

Now just have cloverPrecise, etc. and no cloverInvPrecise, etc. fields
Removes the double clover creation
I've tested this with and without dynamic inversion, and both seem to work fine
I found a bug in dynamic-clover while testing: the FullClover inverse pointer is dereferenced in dslash_twisted_clover.cu when binding the texture. However, this was a null pointer in dirac_twisted_clover.cpp; trivial to this fix this, not sure why this wasn't seen before, but it could have caused some of the stability issues reported previously by @kostrzewa.

AlexVaq · 2016-06-17T07:53:15Z

I'll try tonight, although I'm abroad right now. This looks far more elegant.

…ed new parameters QudaInvertParam::compute_clover_invert and QudaInvertParam::return_clover_invert to explicitly request clover inversion computation and copy back to the host.

…field is inverted and returned when checking for correctness

maddyscientist · 2016-06-18T06:52:19Z

@AlexVaq I've done another round of clean up, merging the code paths between clover and twisted-clover. In dslash_test, clover now matches the behaviour of twisted-clover in that the inversion is computed and returned to the host. While I've preserved the convention that passing null pointers triggers device-side computation, and passing pointers triggers a copy from the host, I've added two new parameters QudaInvertParam::compute_clover_invert and QudaInvertParam::return_clover_invert to force device-side computation and request copy back of the clover inverse.

I've still got a couple of things to do, namely extend similar support to invet_test, such that the inverse is copied back, and also copy back the clover field itself.

AlexVaq · 2016-06-18T07:14:10Z

That's really interesting! One of my non-trivial tests of the twisted-clover implementation was to compare twisted-clover and clover at mu=0, but because of the different way of dealing with the inverse in some tests, it was complicated. I don't like too much the convention of the null pointers either, but I have to say that it works very well. In any case, the new parameters are a welcome addition. I'd like to help you, but since I'm out of office, I can only try to follow your steps in my phone... Great job!

…ram::return_clover which are used to instruct loadcloverQuda whether to compute the clover field or to download it, and also once computed, whether to copy it back to the host application. Default is to no to both to repsect previous interface behaviour.

…e computation of the clover field (and inverse) and for setting the clover coefficient. Updated dslash_test for this: if --compute-clover true is given then the clover field and its inverse is computed in QUDA and is copied back to the host for verfication; else a random field (which is also used for the inverse)is constructed on the host and downloaded to the device. Default is the latter as per previous behaviour.

…line parameters: if the former is set to true then the true clover field is computed using the set clover coefficient. Enabled multi-shift solver testing for twisted-clover and fixed memory leaks in invert_test.

…ive the compiler an easier time with register allocation. Applied this to clover creation and inversion.

maddyscientist · 2016-06-21T06:38:55Z

With my last commit, this pull is ready for merging I believe. I have updated the pull description to include all the work done since when the pull was opened.

@mathiaswagner Do you want to take a look at merging this?

maddyscientist · 2016-06-24T20:19:59Z

@alexstrel Can you review this pull request and merge? Getting this branch merged soon would be good as it contains a number of important fixes. (@mathiaswagner has gone on holiday for the next two weeks.)

mathiaswagner · 2016-06-24T20:32:11Z

Merged. Might need some review to check. Especially reference implementation should be flawless.

maddyscientist added 3 commits June 4, 2016 14:54

Implemented reference clover application operator and enabled true Wi…

153d44a

…lson-clover unit testing in dslash_test and invert_test. Some minor anciliiary clean up of reference blas as the same time.

Fixed cmake bug in dslash_test and invert_test compilation

af6d23a

Fixed bugs in clover application: dslash_test now supports full clove…

b6de874

…r testing (part 3 of #19).

maddyscientist added the feature label Jun 10, 2016

maddyscientist added this to the QUDA 0.9 milestone Jun 10, 2016

maddyscientist assigned AlexVaq Jun 10, 2016

maddyscientist and others added 3 commits June 10, 2016 02:03

Fixed staggered reference dslash compiliation

6fd83bb

Set off-diagonal clover terms to be non-zero in clover invert test to…

da0baba

… make clover verification non trivial

Added host routines for twisted-clover

a174af0

Includes DYNAMIC_CLOVER support

Fixed bug in tuning of clover inversion if source destination are the…

9b1795e

… same: source must be preserved and restored since tuning destroys the input.

maddyscientist added 3 commits June 15, 2016 14:48

clover::FloatNOrder now uses vectorized load/store for improved perfo…

5f028db

…rmance of all algorithms that use this (clover inversion sees a 1.5x speedup). Added missing support for clover norm field save/restore in tuning.

Updated clover inversion preTune/postTune for direct pointer access i…

2a6a941

…ntroduced in clover::FloatNOrder in last commit

Set clover norm to 0.1, as 0.2 leads to poorly conditioned clover mat…

ae3e7ac

…rix that results in signficant errors in cholesky decomposition leading to failure in twisted-clover test.

maddyscientist added 3 commits June 15, 2016 19:55

Clover inversion now uses grid-wide reduction rather than atomics to …

017a67b

…compute trace log: this makes the result reproducible after tuning has completed. Trace-log computation and twist parameters are now templated to improve code generation.

Display last error when autotuning fails

cab5e73

Fixed warning in last commit

a671b9c

Merged separate clover and clover inverse fields for twisted-clover f…

9da9306

…ermions. Fixed bug affecting dyanmic-clover caused by dereferencing a null pointer.

maddyscientist added 3 commits June 17, 2016 23:31

Added default and copy constructors for CloverFieldParam

3a8139c

Significant cleanup of loadCloverQuda to minimize redundant code. Add…

b1d088a

…ed new parameters QudaInvertParam::compute_clover_invert and QudaInvertParam::return_clover_invert to explicitly request clover inversion computation and copy back to the host.

In dslash_test, clover now matches twisted-clover in that the clover …

0a64e6d

…field is inverted and returned when checking for correctness

maddyscientist added 6 commits June 20, 2016 20:01

Added missing --clover-coeff support to dslash_test

86034f9

Added support for new clover testing in multigrid_invert_test

d011bf2

Added chiral-block-level accessor for clover::FloatNOrder to try to g…

38fda50

…ive the compiler an easier time with register allocation. Applied this to clover creation and inversion.

maddyscientist assigned mathiaswagner and unassigned AlexVaq Jun 21, 2016

Added missing guard around clover construction

8899390

maddyscientist assigned alexstrel Jun 24, 2016

mathiaswagner merged commit f96193d into develop Jun 24, 2016

maddyscientist deleted the feature/host-clover branch June 24, 2016 21:16

maddyscientist unassigned alexstrel and mathiaswagner Jun 24, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/host clover #480

Feature/host clover #480

maddyscientist commented Jun 10, 2016 •

edited

Loading

AlexVaq commented Jun 10, 2016

AlexVaq commented Jun 12, 2016 •

edited

Loading

AlexVaq commented Jun 13, 2016

maddyscientist commented Jun 15, 2016

AlexVaq commented Jun 15, 2016

maddyscientist commented Jun 15, 2016

AlexVaq commented Jun 15, 2016

maddyscientist commented Jun 15, 2016

maddyscientist commented Jun 15, 2016

AlexVaq commented Jun 15, 2016

maddyscientist commented Jun 15, 2016

maddyscientist commented Jun 15, 2016

AlexVaq commented Jun 16, 2016

maddyscientist commented Jun 16, 2016

AlexVaq commented Jun 16, 2016

maddyscientist commented Jun 17, 2016 •

edited

Loading

AlexVaq commented Jun 17, 2016 via email

maddyscientist commented Jun 18, 2016

AlexVaq commented Jun 18, 2016 via email

maddyscientist commented Jun 21, 2016

maddyscientist commented Jun 24, 2016

mathiaswagner commented Jun 24, 2016

Feature/host clover #480

Feature/host clover #480

Conversation

maddyscientist commented Jun 10, 2016 • edited Loading

AlexVaq commented Jun 10, 2016

AlexVaq commented Jun 12, 2016 • edited Loading

AlexVaq commented Jun 13, 2016

maddyscientist commented Jun 15, 2016

AlexVaq commented Jun 15, 2016

maddyscientist commented Jun 15, 2016

AlexVaq commented Jun 15, 2016

maddyscientist commented Jun 15, 2016

maddyscientist commented Jun 15, 2016

AlexVaq commented Jun 15, 2016

maddyscientist commented Jun 15, 2016

maddyscientist commented Jun 15, 2016

AlexVaq commented Jun 16, 2016

maddyscientist commented Jun 16, 2016

AlexVaq commented Jun 16, 2016

maddyscientist commented Jun 17, 2016 • edited Loading

AlexVaq commented Jun 17, 2016 via email

maddyscientist commented Jun 18, 2016

AlexVaq commented Jun 18, 2016 via email

maddyscientist commented Jun 21, 2016

maddyscientist commented Jun 24, 2016

mathiaswagner commented Jun 24, 2016

maddyscientist commented Jun 10, 2016 •

edited

Loading

AlexVaq commented Jun 12, 2016 •

edited

Loading

maddyscientist commented Jun 17, 2016 •

edited

Loading