-
-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TrustRegion bugs #210
TrustRegion bugs #210
Conversation
Codecov Report
@@ Coverage Diff @@
## master #210 +/- ##
==========================================
- Coverage 94.94% 93.98% -0.96%
==========================================
Files 8 8
Lines 732 782 +50
==========================================
+ Hits 695 735 +40
- Misses 37 47 +10
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
src/trustRegion.jl
Outdated
cache.stats.njacs += 1 | ||
end | ||
|
||
linres = dolinsolve(alg.precs, linsolve, A = cache.H, b = _vec(cache.g), | ||
linres = dolinsolve(alg.precs, linsolve, A = J, b = _vec(fu), # cache.H, b = _vec(cache.g), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is incorrect because here u_tmp
is actually as calculated u
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a subtle point. You are right that on paper you do a Gauss-Newton step, i.e., NLsolve
, the same is done actually. The only difference is that they default to computing the Moore-Penrose inverse of dolinsolve
?). I meant to ask how we should go about this actually. I feel like this should be handled by LinearSolve
via some option which is why I didn't put it in here yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ChrisRackauckas Any thoughts on how to go about the case of singular Jacobians? Is this already handled in LinearSolve
, for example triggered by an LAPACK singularity error or so? If not, I feel it really should go there as opposed to here. That said, putting it here would be a quick solution for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LinearSolve generally attempts to be non-blocking, so it will return a solution without erroring, though the return code may indicate that the solution did not solve correctly. It should just be a retcode check then, though specifically with singularity issues we probably need to add a few retcode improvements. In the ODE solver that is generally not checked because the adaptivity handles it appropriately, though this is a general improvement we should do anyways.
…to avoid growing ill-conditioning
…nt with the rest of the package)
rebased and now formally correct. Last things to do:
|
…oids a bug due to mutation of Jacobian in dolinsolve.
I think I have caught all bugs now. When using the
Just for reference, here is the before
|
src/trustRegion.jl
Outdated
linu = _vec(u_tmp), | ||
# do not use A = cache.H, b = _vec(cache.g) since it is equivalent | ||
# to A = cache.J, b = _vec(fu) as long as the Jacobian is non-singular | ||
linres = dolinsolve(alg.precs, linsolve, A = J, b = _vec(fu), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems to be mutating J
(resp. A
) by default. I assume that is intended but requires care when the trust region step is being rejected. For now I simply avoid recomputing the Gauss-Newton step in that case since that is just unnecessary work (although we used to do it for some reason). That said, this won't work for Jacobian reuse trust region methods which may be on the horizon. Just leaving this note here since this is quite subtle and can lead to hard-to-detect problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@frankschae @avik-pal make note.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recently saw this as well while debugging a Gauss Newton bug, do we know why this is happening? Given that alias_A = true
|
||
The same updating scheme as in NLsolve's (https://github.com/JuliaNLSolvers/NLsolve.jl) trust region dogleg implementation. | ||
""" | ||
NLsolve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just call it dogleg?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are all dogleg methods, i.e., they all use the dogleg step to compute the update. The difference between all options is just how the trust region is being adapted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, it's a bit of a weird name though. Is there no reference to where it came from? If not then we should credit Spenser Lyon who first implemented what was there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see anything in the NLsolve docs or code. I just opened an issue. Maybe someone knows more over there. If not, we can also just call it Lyon
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's update the name in a follow up. I want to get this into the benchmarks though.
src/trustRegion.jl
Outdated
Trust region updating scheme as in Nocedal and Wrigt [see Alg 11.5, page 291]. | ||
""" | ||
NW |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trust region updating scheme as in Nocedal and Wrigt [see Alg 11.5, page 291]. | |
""" | |
NW | |
Trust region updating scheme as in Nocedal and Wrigt [see Alg 11.5, page 291]. | |
""" | |
NocedalWrigt |
Is the full 23 function thing a test already? If not, can we make it a regression test please? |
I don't think they are. It would hit a lot (if not all) of the code though which would be nice. I am wondering though: How do we distinguish failure from success since we can't expect all methods to converge to a small residual on all problems? |
What about we use the list you have above and mark our solvers that currently fail as test_broken (I know it is not exactly correct since some of them are expected to not converge and aren't exactly broken)? That way, we won't have a regression in the future of the currently working versions |
I had a look at #142. Indeed there were some mistakes in how the Gauss-Newton step and Cauchy point were calculated. This is still WIP, however, since TrustRegion is still not up to par with nlsolve's implementation after these changes.
I suspect a large issue for the poor performance is the default TR adaptation and step acceptance/rejection routine; for instance the current default is rather conservative with respect to which steps are being accepted. Lowering the
step_threshold
default to 1e-4 from 0.1 (and after correcting the GN/Cauchy step computations) the performance on the test set is already quite a bit better.Before
After
Will investigate further.