-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/host clover #480
Feature/host clover #480
Conversation
…lson-clover unit testing in dslash_test and invert_test. Some minor anciliiary clean up of reference blas as the same time.
…r testing (part 3 of #19).
Sure, I'll. |
… make clover verification non trivial
Includes DYNAMIC_CLOVER support
I think this is ready, I have just a comment: for DYNAMIC_CLOVER some test fail. The asymmetric kernels perform better than the symmetric ones (I suppose the less the inverse clover is applied, the better). Although I'm not very worried (things get better with increasing precision, so the computation must be ok) I don't understand very well why it happens. Here is the list of failed tests (only tests involving the clover inverse are shown): Test.........Single........Double........Single^+.....Double^+.........................^+ means dagger |
Why is the default check failing? I don't understand, I try to see something in the logs but I can't find anything (apart from a problem with the login, that also appears in successful compilations). |
It doesn't sound correct that dynamic clover inversion should fail, especially the single precision fails at O(1e-1) deviation. Taking a look at this now. How well is invert_test working for this? |
I didn't see any problems in invert_test, but my setup might not have been evil enough. |
… same: source must be preserved and restored since tuning destroys the input.
I've fixed one bug that could be affecting both this and #474: when using the symmetric preconditioning, the input and output fields to the clover inverse alias, in this case we must save and restore the input clover field else the autotuning will destroy the result. This could be affecting #474 since the autotuner isn't enabled for the first solve in general since autotuning is switched on by the call to |
Oh-ho, that looks really twisted. Thanks for fixing it. Do you have results for dslash_test? Is the table I sent before noticeably changing? |
It doesn't seem to affect the large deviations, it just fixes getting Nans if tuning is enabled (and the tuning is explicitly switched on prior to the |
Ok, having thought about this for a wee while, I think my fix above probably does (partially) fix #474. While it appears that the reliable update breaks the convergence, I suspect it was always junk (CG is just outputting the iterated residual): if you do not use an initial guess, then the initial residual will always be just the norm of the source vector, and actual residual isn't computed until the reliable update at which point it explodes. |
Yes, I see the large deviations are not fixed. I'm a bit puzzled about this... |
…rmance of all algorithms that use this (clover inversion sees a 1.5x speedup). Added missing support for clover norm field save/restore in tuning.
…ntroduced in clover::FloatNOrder in last commit
…rix that results in signficant errors in cholesky decomposition leading to failure in twisted-clover test.
I've tracked down the source of the failures: with |
@AlexVaq Why do you need both Along a similar line, why do you call BTW, with my latest changes to the |
…compute trace log: this makes the result reproducible after tuning has completed. Trace-log computation and twist parameters are now templated to improve code generation.
@maddyscientist ok, I see. I chose norm = 0.2 as a big departure from norm = 1.0 that was being used in the clover case, and thought it worked reasonably well, but it seems it was not "good enough" (I take that you read the comment I put in dslash_test.cpp about the matter). Regarding cloverInvPrecise, it's useless with dynamic clover, but without the inversion on the fly we have to store two different clover fields. I was not aware that you could store two distinct clover fields within the same container (one would be the standard clover and the other would be the inverse of [clover^2 + mu^2] ). If it's possible, then definitely I should do it. The two calls to computeClover is a reminiscence of very old code that, for some reason still unknown to me, was giving wrong results when copying the clover term, and was giving right results when recomputing it. Since this was all startup time and didn't add a significant overhead to the overall computation, I'd just forgotten it, but I'm glad now it works as it should. If the accessor improved performance and the differences are not very significant, then we can get rid the fugly macro, but we should test it before we make changes difficult to revert. |
Regarding whether we need two fields or not for twisted clover: it looks like the I haven't tested that only one clover computation is needed with a copy, but I will try it. The fact that |
Happy to know everything is all right now. This library is becoming |
…ermions. Fixed bug affecting dyanmic-clover caused by dereferencing a null pointer.
@AlexVaq Can you take a look at my last commit? I've merged the two clover fields for twisted clover which cleans up the code nicely.
|
I'll try tonight, although I'm abroad right now. This looks far more elegant.
|
…ed new parameters QudaInvertParam::compute_clover_invert and QudaInvertParam::return_clover_invert to explicitly request clover inversion computation and copy back to the host.
…field is inverted and returned when checking for correctness
@AlexVaq I've done another round of clean up, merging the code paths between clover and twisted-clover. In dslash_test, clover now matches the behaviour of twisted-clover in that the inversion is computed and returned to the host. While I've preserved the convention that passing null pointers triggers device-side computation, and passing pointers triggers a copy from the host, I've added two new parameters I've still got a couple of things to do, namely extend similar support to invet_test, such that the inverse is copied back, and also copy back the clover field itself. |
That's really interesting! One of my non-trivial tests of the twisted-clover implementation was to compare twisted-clover and clover at mu=0, but because of the different way of dealing with the inverse in some tests, it was complicated.
I don't like too much the convention of the null pointers either, but I have to say that it works very well. In any case, the new parameters are a welcome addition.
I'd like to help you, but since I'm out of office, I can only try to follow your steps in my phone... Great job!
|
…ram::return_clover which are used to instruct loadcloverQuda whether to compute the clover field or to download it, and also once computed, whether to copy it back to the host application. Default is to no to both to repsect previous interface behaviour.
…e computation of the clover field (and inverse) and for setting the clover coefficient. Updated dslash_test for this: if --compute-clover true is given then the clover field and its inverse is computed in QUDA and is copied back to the host for verfication; else a random field (which is also used for the inverse)is constructed on the host and downloaded to the device. Default is the latter as per previous behaviour.
…line parameters: if the former is set to true then the true clover field is computed using the set clover coefficient. Enabled multi-shift solver testing for twisted-clover and fixed memory leaks in invert_test.
…ive the compiler an easier time with register allocation. Applied this to clover creation and inversion.
With my last commit, this pull is ready for merging I believe. I have updated the pull description to include all the work done since when the pull was opened. @mathiaswagner Do you want to take a look at merging this? |
@alexstrel Can you review this pull request and merge? Getting this branch merged soon would be good as it contains a number of important fixes. (@mathiaswagner has gone on holiday for the next two weeks.) |
Merged. Might need some review to check. Especially reference implementation should be flawless. |
(Updated description of pull to reflect all changes made since pull was first opened.)
This pull adds support for the following
--test
variants are now properly supported for dslash_test for much improved testing.loadCloverQuda
, merging the code paths between clover and twisted-clover.loadCloverQuda
now supports explicitly requesting the clover field or its inverse be computed explicitly, rather than relying on null pointers to trigger this (through theQudaInvertParam::compute_clover
andQudaInvertParam::compute_clover_inverse
parameters, respectively).loadCloverQuda
now supports copying back the computed clover field and clover-inverse field (through theQudaInvertParam::return_clover
andQudaInvertParam::return_clover_inverse
parameters, respectively); this functionality is used by both dslash_test, invert_test and multigrid_invert_test to correctly apply the clover dslash operators on the host for true residual comparison.--clover-coeff
and--compute-clover
which controls whether a random clover field is used or the actual clover field is computed and inverted when running dslash_test and invert_test.clover::FloatNOrder
accessors through vectorized memory accesses.This doesn't do the reference clover term construction, but this is a separate issue.