Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/host clover #480

Merged
merged 24 commits into from
Jun 24, 2016
Merged

Feature/host clover #480

merged 24 commits into from
Jun 24, 2016

Conversation

maddyscientist
Copy link
Member

@maddyscientist maddyscientist commented Jun 10, 2016

(Updated description of pull to reflect all changes made since pull was first opened.)

This pull adds support for the following

  • Reference clover-dslash and twisted-clover application operators. This provides us with a means to check the correction of the QUDA implementation of the wilson-clover and twisted-clover dslash operators.
  • All --test variants are now properly supported for dslash_test for much improved testing.
  • Significant clean up of loadCloverQuda, merging the code paths between clover and twisted-clover.
  • loadCloverQuda now supports explicitly requesting the clover field or its inverse be computed explicitly, rather than relying on null pointers to trigger this (through the QudaInvertParam::compute_clover and QudaInvertParam::compute_clover_inverse parameters, respectively).
  • loadCloverQuda now supports copying back the computed clover field and clover-inverse field (through the QudaInvertParam::return_clover and QudaInvertParam::return_clover_inverse parameters, respectively); this functionality is used by both dslash_test, invert_test and multigrid_invert_test to correctly apply the clover dslash operators on the host for true residual comparison.
  • Multi-shift solver now supported for twisted-mass and twisted-clover fermions in invert_test.
  • Added new command-line parameters to unit tests --clover-coeff and --compute-clover which controls whether a random clover field is used or the actual clover field is computed and inverted when running dslash_test and invert_test.
  • Improved performance of clover::FloatNOrder accessors through vectorized memory accesses.
  • Fixes an autotuning bug in the clover inversion, triggered when the input and output fields alias.
  • Fixes a null pointer dereference bug in dynamic clover with twisted-clover fermions.

This doesn't do the reference clover term construction, but this is a separate issue.

…lson-clover unit testing in dslash_test and invert_test. Some minor anciliiary clean up of reference blas as the same time.
@AlexVaq
Copy link
Member

AlexVaq commented Jun 10, 2016

Sure, I'll.

@AlexVaq
Copy link
Member

AlexVaq commented Jun 12, 2016

I think this is ready, I have just a comment: for DYNAMIC_CLOVER some test fail. The asymmetric kernels perform better than the symmetric ones (I suppose the less the inverse clover is applied, the better). Although I'm not very worried (things get better with increasing precision, so the computation must be ok) I don't understand very well why it happens. Here is the list of failed tests (only tests involving the clover inverse are shown):

Test.........Single........Double........Single^+.....Double^+.........................^+ means dagger
0 Asym.....1e-1..........1e-10...........1e-1............1e-10
0 Sym......Pass..........1e-11...........1e-1............1e-10
1 Asym....Pass..........Pass............Pass...........Pass
1 Sym......Pass..........Pass............1e-2...........1e-11
3 Asym....Pass..........1e-11...........Pass...........Pass
3 Sym......1e-1..........1e-10...........1e-1............1e-10

@AlexVaq
Copy link
Member

AlexVaq commented Jun 13, 2016

Why is the default check failing? I don't understand, I try to see something in the logs but I can't find anything (apart from a problem with the login, that also appears in successful compilations).

@maddyscientist
Copy link
Member Author

It doesn't sound correct that dynamic clover inversion should fail, especially the single precision fails at O(1e-1) deviation. Taking a look at this now.

How well is invert_test working for this?

@AlexVaq
Copy link
Member

AlexVaq commented Jun 15, 2016

I didn't see any problems in invert_test, but my setup might not have been evil enough.

… same: source must be preserved and restored since tuning destroys the input.
@maddyscientist
Copy link
Member Author

I've fixed one bug that could be affecting both this and #474: when using the symmetric preconditioning, the input and output fields to the clover inverse alias, in this case we must save and restore the input clover field else the autotuning will destroy the result. This could be affecting #474 since the autotuner isn't enabled for the first solve in general since autotuning is switched on by the call to invertQuda and previous calls do not have it enabled. So it would make sense that the second call breaks the inversion, though it doesn't explain why it would seemingly converge until the reliable update.

@AlexVaq
Copy link
Member

AlexVaq commented Jun 15, 2016

Oh-ho, that looks really twisted. Thanks for fixing it. Do you have results for dslash_test? Is the table I sent before noticeably changing?

@maddyscientist
Copy link
Member Author

It doesn't seem to affect the large deviations, it just fixes getting Nans if tuning is enabled (and the tuning is explicitly switched on prior to the loadCloverQuda call. Still looking at this.

@maddyscientist
Copy link
Member Author

Ok, having thought about this for a wee while, I think my fix above probably does (partially) fix #474. While it appears that the reliable update breaks the convergence, I suspect it was always junk (CG is just outputting the iterated residual): if you do not use an initial guess, then the initial residual will always be just the norm of the source vector, and actual residual isn't computed until the reliable update at which point it explodes.

@AlexVaq
Copy link
Member

AlexVaq commented Jun 15, 2016

Yes, I see the large deviations are not fixed. I'm a bit puzzled about this...

…rmance of all algorithms that use this (clover inversion sees a 1.5x speedup). Added missing support for clover norm field save/restore in tuning.
…ntroduced in clover::FloatNOrder in last commit
…rix that results in signficant errors in cholesky decomposition leading to failure in twisted-clover test.
@maddyscientist
Copy link
Member Author

I've tracked down the source of the failures: with norm=0.2 and diag=1.0, the resulting clover matrices can be very poorly conditioned, e.g., O(1e12) condition number. The resulting inverse matrices have wildly varying coefficients, with the result that the difference of order of operation between the reference CPU and GPU code paths lead to different answers. This is simply remedied by setting norm=0.1 to reduce the condition number of the clover matrices,

@maddyscientist
Copy link
Member Author

@AlexVaq Why do you need both cloverPrecise and cloverInvPrecise fields? It appears that for the former you only define the direct piece, and the latter only defines the inverse piece. I realize that cloverInvPrecise includes the twisted part as well, but any reason why you just can't store this inverse in the same container as cloverPrecise? Perhaps I'm missing something, but it appears you never need the inverse of the regular clover term, only when it includes the twisted part.

Along a similar line, why do you call computeClover twice, instead of copying cloverPrecise to cloverInvPrecise?

BTW, with my latest changes to the clover::FloatNOrder accessor, the performance of the high level clover inversion should be closer to your macro version.

…compute trace log: this makes the result reproducible after tuning has completed. Trace-log computation and twist parameters are now templated to improve code generation.
@AlexVaq
Copy link
Member

AlexVaq commented Jun 16, 2016

@maddyscientist ok, I see. I chose norm = 0.2 as a big departure from norm = 1.0 that was being used in the clover case, and thought it worked reasonably well, but it seems it was not "good enough" (I take that you read the comment I put in dslash_test.cpp about the matter).

Regarding cloverInvPrecise, it's useless with dynamic clover, but without the inversion on the fly we have to store two different clover fields. I was not aware that you could store two distinct clover fields within the same container (one would be the standard clover and the other would be the inverse of [clover^2 + mu^2] ). If it's possible, then definitely I should do it.

The two calls to computeClover is a reminiscence of very old code that, for some reason still unknown to me, was giving wrong results when copying the clover term, and was giving right results when recomputing it. Since this was all startup time and didn't add a significant overhead to the overall computation, I'd just forgotten it, but I'm glad now it works as it should.

If the accessor improved performance and the differences are not very significant, then we can get rid the fugly macro, but we should test it before we make changes difficult to revert.

@maddyscientist
Copy link
Member Author

Regarding whether we need two fields or not for twisted clover: it looks like the mu term only appears in the inverse explicitly and there isn't any difference between the direct term with/without twist. Is that correct? That was the motivation for my observation that we only need one field.

I haven't tested that only one clover computation is needed with a copy, but I will try it.

The fact that norm=1.0 works for regular clover is because I wasn't inverting the clover term, so your test was stronger.

@AlexVaq
Copy link
Member

AlexVaq commented Jun 16, 2016

it looks like the mu term only appears in the inverse explicitly and
there isn't any difference between the direct term with/without twist. Is
that correct?

Yes, it is. If you can store the data for C and [C^2 + mu^2] in
cloverPrecise, then it can be done. Since sometimes I don't understand all
the possibilities of the QUDA classes (the library is growing very
fast...), I took the easy way, but it might not be the most efficient
one...

The fact that norm=1.0 works for regular clover is because I wasn't
inverting the clover term, so your test was stronger.

Yes, I understood that. That's why I chose norm=0.2, but it was an ill
choice.

Happy to know everything is all right now. This library is becoming
extremely useful, I think I didn't use anything else for the last four
years.

…ermions. Fixed bug affecting dyanmic-clover caused by dereferencing a null pointer.
@maddyscientist
Copy link
Member Author

maddyscientist commented Jun 17, 2016

@AlexVaq Can you take a look at my last commit? I've merged the two clover fields for twisted clover which cleans up the code nicely.

  • Now just have cloverPrecise, etc. and no cloverInvPrecise, etc. fields
  • Removes the double clover creation
  • I've tested this with and without dynamic inversion, and both seem to work fine
  • I found a bug in dynamic-clover while testing: the FullClover inverse pointer is dereferenced in dslash_twisted_clover.cu when binding the texture. However, this was a null pointer in dirac_twisted_clover.cpp; trivial to this fix this, not sure why this wasn't seen before, but it could have caused some of the stability issues reported previously by @kostrzewa.

@AlexVaq
Copy link
Member

AlexVaq commented Jun 17, 2016 via email

…ed new parameters QudaInvertParam::compute_clover_invert and QudaInvertParam::return_clover_invert to explicitly request clover inversion computation and copy back to the host.
…field is inverted and returned when checking for correctness
@maddyscientist
Copy link
Member Author

@AlexVaq I've done another round of clean up, merging the code paths between clover and twisted-clover. In dslash_test, clover now matches the behaviour of twisted-clover in that the inversion is computed and returned to the host. While I've preserved the convention that passing null pointers triggers device-side computation, and passing pointers triggers a copy from the host, I've added two new parameters QudaInvertParam::compute_clover_invert and QudaInvertParam::return_clover_invert to force device-side computation and request copy back of the clover inverse.

I've still got a couple of things to do, namely extend similar support to invet_test, such that the inverse is copied back, and also copy back the clover field itself.

@AlexVaq
Copy link
Member

AlexVaq commented Jun 18, 2016 via email

…ram::return_clover which are used to instruct loadcloverQuda whether to compute the clover field or to download it, and also once computed, whether to copy it back to the host application. Default is to no to both to repsect previous interface behaviour.
…e computation of the clover field (and inverse) and for setting the clover coefficient. Updated dslash_test for this: if --compute-clover true is given then the clover field and its inverse is computed in QUDA and is copied back to the host for verfication; else a random field (which is also used for the inverse)is constructed on the host and downloaded to the device. Default is the latter as per previous behaviour.
…line parameters: if the former is set to true then the true clover field is computed using the set clover coefficient. Enabled multi-shift solver testing for twisted-clover and fixed memory leaks in invert_test.
…ive the compiler an easier time with register allocation. Applied this to clover creation and inversion.
@maddyscientist
Copy link
Member Author

With my last commit, this pull is ready for merging I believe. I have updated the pull description to include all the work done since when the pull was opened.

@mathiaswagner Do you want to take a look at merging this?

@maddyscientist
Copy link
Member Author

@alexstrel Can you review this pull request and merge? Getting this branch merged soon would be good as it contains a number of important fixes. (@mathiaswagner has gone on holiday for the next two weeks.)

@mathiaswagner mathiaswagner merged commit f96193d into develop Jun 24, 2016
@mathiaswagner
Copy link
Member

Merged. Might need some review to check. Especially reference implementation should be flawless.

@maddyscientist maddyscientist deleted the feature/host-clover branch June 24, 2016 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants