Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TrustRegion bugs #210

Merged
merged 23 commits into from
Sep 29, 2023
Merged

TrustRegion bugs #210

merged 23 commits into from
Sep 29, 2023

Conversation

FHoltorf
Copy link
Contributor

@FHoltorf FHoltorf commented Sep 14, 2023

I had a look at #142. Indeed there were some mistakes in how the Gauss-Newton step and Cauchy point were calculated. This is still WIP, however, since TrustRegion is still not up to par with nlsolve's implementation after these changes.

I suspect a large issue for the poor performance is the default TR adaptation and step acceptance/rejection routine; for instance the current default is rather conservative with respect to which steps are being accepted. Lowering the step_threshold default to 1e-4 from 0.1 (and after correcting the GN/Cauchy step computations) the performance on the test set is already quite a bit better.

Before

nr  Problem                                            n     NewtonRaphson       TrustRegion         LevenbergMarquardt  nlsolve
1   Generalized Rosenbrock function                    10    3.9833e-311         4.9193              4.9193              0.0
2   Powell singular function                           4     7.0435e-16          7.941               0.021633            7.0217e-16
3   Powell badly scaled function                       2     0.0                 26897.0             2.1983e-14          0.0
4   Wood function                                      4     4.1832e-14          19081.0             8550.5              8.2157e-15
5   Helical valley function                            3     6.2775e-30          50.0                50.0                4.3667e-26
6   Watson function                                    2     18860.0             3004.4              129.15              114.89
7   Chebyquad function                                 2     5.5511e-17          0.075044            0.34127             9.4369e-16
8   Brown almost linear function                       10    0.0                 0.01533             16.53               4.4409e-16
9   Discrete boundary value function                   10    5.4548e-16          0.0047026           6.5869e-15          2.7756e-16
10  Discrete integral equation function                10    2.7807e-15          1.9166e-15          9.3334e-15          5.5511e-17
11  Trigonometric function                             10    1.4753e-15          2.2808e-15          0.084117            6.8695e-16
12  Variably dimensioned function                      10    0.0                 2.8636e105          1.376e-12           0.0
13  Broyden tridiagonal function                       10    1.2803e-15          2.7487e-15          1.1633e-13          6.6613e-16
14  Broyden banded function                            10    1.4002e-15          2.1729e-15          1.6865e-13          5.5511e-16
15  Hammarling 2 by 2 matrix square root problem       4     2.2204e-16          0.32355             0.91487             2.2204e-16
16  Hammarling 3 by 3 matrix square root problem       9     2.2204e-16          0.30203             1.0506              2.2204e-16
17  Dennis and Schnabel 2 by 2 example                 2     0.0                 4.4408e-16          17.262              0.0
18  Sample problem 18                                  2     0.0                 6.2862e-9           2.1951              8.291e-17
19  Sample problem 19                                  2     9.5092e-16          3.4505e-7           76.367              7.7402e-16
20  Scalar problem f(x) = x(x - 5)^2                   1     0.0                 1.8919e-16          2.5564e-18          0.0
21  Freudenstein-Roth function                         2     0.0                 20.012              20.012              4.949
22  Boggs function                                     2     0.0                 1.6031              2.0                 3.9431e-17
23  Chandrasekhar function                             10    6.6613e-16          9.9301e-16          1.0906e-13          2.2204e-16

After

nr  Problem                                            n     NewtonRaphson       TrustRegion         LevenbergMarquardt  nlsolve             
1   Generalized Rosenbrock function                    10    4.0179e-314         0.0050578           4.9193              0.0                 
2   Powell singular function                           4     7.0435e-16          2.1234e-6           0.021633            7.0217e-16          
3   Powell badly scaled function                       2     0.0                 0.0                 9.9944e-15          0.0                 
4   Wood function                                      4     4.1138e-14          4287.4              8550.5              8.2157e-15          
5   Helical valley function                            3     6.2775000000000006e-301.5158e-18          50.0                4.3667e-26          
6   Watson function                                    2     18979.0             28248.0             129.15              114.89              
7   Chebyquad function                                 2     5.5511e-17          1.6653e-16          0.34127             9.4369e-16          
8   Brown almost linear function                       10    0.0                 1.1102e-16          16.53               4.4409e-16          
9   Discrete boundary value function                   10    5.4548e-16          6.9388e-17          7.2775e-15          2.7756e-16          
10  Discrete integral equation function                10    2.7807e-15          3.4694e-17          4.8191e-15          5.5511e-17          
11  Trigonometric function                             10    1.4753e-15          6.2384e-16          0.084117            8.3961e-16          
12  Variably dimensioned function                      10    0.0                 0.0                 1.3104e-12          0.0                 
13  Broyden tridiagonal function                       10    1.2803e-15          1.1376e-15          1.1957e-13          6.6613e-16          
14  Broyden banded function                            10    1.4002e-15          1.3824e-15          1.665e-13           5.5511e-16          
15  Hammarling 2 by 2 matrix square root problem       4     2.2204e-16          0.017726            0.91487             2.2204e-16          
16  Hammarling 3 by 3 matrix square root problem       9     2.2204e-16          0.0093482           1.0506              2.2204e-16          
17  Dennis and Schnabel 2 by 2 example                 2     0.0                 0.0                 17.262              0.0                 
18  Sample problem 18                                  2     0.0                 1.413e-15           2.1951              7.658e-18           
19  Sample problem 19                                  2     9.5092e-16          0.00013496          76.367              7.7402e-16          
20  Scalar problem f(x) = x(x - 5)^2                   1     0.0                 1.9702e-24          2.5564e-18          0.0                 
21  Freudenstein-Roth function                         2     0.0                 20.012              20.012              4.949               
22  Boggs function                                     2     0.0                 7.3505e-17          2.0                 3.9431e-17          
23  Chandrasekhar function                             10    6.6613e-16          7.6918e-16          1.2718e-13          2.2204e-16 

Will investigate further.

@codecov
Copy link

codecov bot commented Sep 14, 2023

Codecov Report

Merging #210 (f5c66c4) into master (d1e0762) will decrease coverage by 0.96%.
The diff coverage is 92.70%.

@@            Coverage Diff             @@
##           master     #210      +/-   ##
==========================================
- Coverage   94.94%   93.98%   -0.96%     
==========================================
  Files           8        8              
  Lines         732      782      +50     
==========================================
+ Hits          695      735      +40     
- Misses         37       47      +10     
Files Coverage Δ
src/trustRegion.jl 95.91% <92.70%> (-2.97%) ⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@FHoltorf FHoltorf marked this pull request as draft September 14, 2023 17:21
cache.stats.njacs += 1
end

linres = dolinsolve(alg.precs, linsolve, A = cache.H, b = _vec(cache.g),
linres = dolinsolve(alg.precs, linsolve, A = J, b = _vec(fu), # cache.H, b = _vec(cache.g),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is incorrect because here $J$ is the jacobian of the actual function and not the merit function $0.5 * ||f(x)||_{2}^2$, but the initial estimate of the step size u_tmp is actually as calculated $-H^{-1}g$, where H and g are the hessian and gradient of the merit func at u ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a subtle point. You are right that on paper you do a Gauss-Newton step, i.e., $p_{GN} = -(J^\top J)^{-1}J^\top f$ (since we use $H = J^\top J$ as proxy for the Hessian of the merit function and $g= J^\top f$). But clearly, if $J$ is non-singular this is equivalent to just the Newton step, i.e., $p_{GN} = p_N = -J^{-1} f$. At the same time, computing $J^\top J$ squares the condition number of $J$ which makes using the Gauss-Newton step unfavorable. In NLsolve, the same is done actually. The only difference is that they default to computing the Moore-Penrose inverse of $J$ via the SVD when it is singular. We don't have that exception yet (unless it is burried in dolinsolve?). I meant to ask how we should go about this actually. I feel like this should be handled by LinearSolve via some option which is why I didn't put it in here yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChrisRackauckas Any thoughts on how to go about the case of singular Jacobians? Is this already handled in LinearSolve, for example triggered by an LAPACK singularity error or so? If not, I feel it really should go there as opposed to here. That said, putting it here would be a quick solution for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LinearSolve generally attempts to be non-blocking, so it will return a solution without erroring, though the return code may indicate that the solution did not solve correctly. It should just be a retcode check then, though specifically with singularity issues we probably need to add a few retcode improvements. In the ODE solver that is generally not checked because the adaptivity handles it appropriately, though this is a general improvement we should do anyways.

@FHoltorf
Copy link
Contributor Author

FHoltorf commented Sep 26, 2023

rebased and now formally correct.

Last things to do:

  • Fix tests
  • Tune hyperparameters

@FHoltorf
Copy link
Contributor Author

I think I have caught all bugs now.

When using the NLsolve trust region scheme, the performance is comparable to NLsolve.jl (see below). There are slight differences which are due to different convergence criteria, I believe. The other trust region schemes perform similarly now (usually a little worse on this test set but not to a meaningful enough point where I would suspect more bugs).

nr  Problem                                            n     NewtonRaphson       TrustRegion         LevenbergMarquardt  nlsolve             
1   Generalized Rosenbrock function                    10    NaN                 0.0                 4.9193              0.0                 
2   Powell singular function                           4     7.0435e-16          7.0435e-16          0.021633            7.0217e-16          
3   Powell badly scaled function                       2     0.0                 0.0                 9.9944e-15          0.0                 
4   Wood function                                      4     4.1138e-14          1.2921e-14          8550.5              1.1546e-14          
5   Helical valley function                            3     6.2775000000000006e-301.0408e-16          50.0                1.0408e-16          
6   Watson function                                    2     NaN                 74.676              NaN                 114.89              
7   Chebyquad function                                 2     5.5511e-17          5.5511e-17          0.34127             5.5511e-17          
8   Brown almost linear function                       10    0.0                 8.8817e-16          16.53               8.8818e-16          
9   Discrete boundary value function                   10    5.4548e-16          5.4548e-16          7.2775e-15          2.7756e-16          
10  Discrete integral equation function                10    2.7807e-15          2.7807e-15          4.8191e-15          5.5511e-17          
11  Trigonometric function                             10    1.4753e-15          2.6733e-15          0.084117            8.4655e-16          
12  Variably dimensioned function                      10    0.0                 0.0                 1.3104e-12          0.0                 
13  Broyden tridiagonal function                       10    1.2803e-15          1.2803e-15          1.1957e-13          8.8818e-16          
14  Broyden banded function                            10    1.4002e-15          1.4002e-15          1.665e-13           8.8818e-16          
15  Hammarling 2 by 2 matrix square root problem       4     2.2204e-16          6.8564e-8           0.91487             6.853e-8            
16  Hammarling 3 by 3 matrix square root problem       9     2.2204e-16          3.7631e-29          1.0506              3.7632000000000004e-29
17  Dennis and Schnabel 2 by 2 example                 2     0.0                 0.0                 17.262              0.0                 
18  Sample problem 18                                  2     0.0                 0.0                 2.1951              0.0                 
19  Sample problem 19                                  2     9.5092e-16          9.5092e-16          76.367              6.7241e-16          
20  Scalar problem f(x) = x(x - 5)^2                   1     0.0                 0.0                 2.5564e-18          0.0                 
21  Freudenstein-Roth function                         2     0.0                 6.9988              20.012              4.949               
22  Boggs function                                     2     0.0                 1.077e-16           2.0                 1.077e-16           
23  Chandrasekhar function                             10    6.6613e-16          6.6613e-16          1.2718e-13          4.4409e-16    

Just for reference, here is the before

nr  Problem                                            n     NewtonRaphson       TrustRegion         LevenbergMarquardt  nlsolve             
1   Generalized Rosenbrock function                    10    NaN                 4.9193              4.9193              0.0                 
2   Powell singular function                           4     7.0435e-16          7.941               0.021633            7.0217e-16          
3   Powell badly scaled function                       2     0.0                 4111.8              9.9944e-15          0.0                 
4   Wood function                                      4     4.1138e-14          2985.5              8550.5              1.1546e-14          
5   Helical valley function                            3     6.2775000000000006e-3050.0                50.0                1.0408e-16          
6   Watson function                                    2     NaN                 6491.3              129.15              114.89              
7   Chebyquad function                                 2     5.5511e-17          0.075044            0.34127             5.5511e-17          
8   Brown almost linear function                       10    0.0                 0.0                 16.53               8.8818e-16          
9   Discrete boundary value function                   10    5.4548e-16          0.0046978           7.2775e-15          2.7756e-16          
10  Discrete integral equation function                10    2.7807e-15          8.1217e-17          4.8191e-15          5.5511e-17          
11  Trigonometric function                             10    1.4753e-15          3.1524e-15          0.084117            8.4655e-16          
12  Variably dimensioned function                      10    0.0                 0.0                 1.3104e-12          0.0                 
13  Broyden tridiagonal function                       10    1.2803e-15          1.143e-15           1.1957e-13          8.8818e-16          
14  Broyden banded function                            10    1.4002e-15          1.4002e-15          1.665e-13           8.8818e-16          
15  Hammarling 2 by 2 matrix square root problem       4     2.2204e-16          0.32355             0.91487             6.853e-8            
16  Hammarling 3 by 3 matrix square root problem       9     2.2204e-16          0.30203             1.0506              3.7632000000000004e-29
17  Dennis and Schnabel 2 by 2 example                 2     0.0                 4.4408e-16          17.262              0.0                 
18  Sample problem 18                                  2     0.0                 5.5995e-18          2.1951              0.0                 
19  Sample problem 19                                  2     9.5092e-16          1.0542e-6           76.367              6.7241e-16          
20  Scalar problem f(x) = x(x - 5)^2                   1     0.0                 1.9702e-24          2.5564e-18          0.0                 
21  Freudenstein-Roth function                         2     0.0                 20.012              20.012              4.949               
22  Boggs function                                     2     0.0                 1.6031              2.0                 1.077e-16           
23  Chandrasekhar function                             10    6.6613e-16          4.965e-16           1.2718e-13          4.4409e-16

linu = _vec(u_tmp),
# do not use A = cache.H, b = _vec(cache.g) since it is equivalent
# to A = cache.J, b = _vec(fu) as long as the Jacobian is non-singular
linres = dolinsolve(alg.precs, linsolve, A = J, b = _vec(fu),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems to be mutating J (resp. A) by default. I assume that is intended but requires care when the trust region step is being rejected. For now I simply avoid recomputing the Gauss-Newton step in that case since that is just unnecessary work (although we used to do it for some reason). That said, this won't work for Jacobian reuse trust region methods which may be on the horizon. Just leaving this note here since this is quite subtle and can lead to hard-to-detect problems.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@frankschae @avik-pal make note.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recently saw this as well while debugging a Gauss Newton bug, do we know why this is happening? Given that alias_A = true

@FHoltorf FHoltorf marked this pull request as ready for review September 27, 2023 04:28

The same updating scheme as in NLsolve's (https://github.com/JuliaNLSolvers/NLsolve.jl) trust region dogleg implementation.
"""
NLsolve
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just call it dogleg?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are all dogleg methods, i.e., they all use the dogleg step to compute the update. The difference between all options is just how the trust region is being adapted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, it's a bit of a weird name though. Is there no reference to where it came from? If not then we should credit Spenser Lyon who first implemented what was there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see anything in the NLsolve docs or code. I just opened an issue. Maybe someone knows more over there. If not, we can also just call it Lyon.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's update the name in a follow up. I want to get this into the benchmarks though.

Comment on lines 38 to 40
Trust region updating scheme as in Nocedal and Wrigt [see Alg 11.5, page 291].
"""
NW
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Trust region updating scheme as in Nocedal and Wrigt [see Alg 11.5, page 291].
"""
NW
Trust region updating scheme as in Nocedal and Wrigt [see Alg 11.5, page 291].
"""
NocedalWrigt

@ChrisRackauckas
Copy link
Member

Is the full 23 function thing a test already? If not, can we make it a regression test please?

@FHoltorf
Copy link
Contributor Author

I don't think they are. It would hit a lot (if not all) of the code though which would be nice. I am wondering though: How do we distinguish failure from success since we can't expect all methods to converge to a small residual on all problems?

@avik-pal
Copy link
Member

What about we use the list you have above and mark our solvers that currently fail as test_broken (I know it is not exactly correct since some of them are expected to not converge and aren't exactly broken)? That way, we won't have a regression in the future of the currently working versions

@ChrisRackauckas ChrisRackauckas merged commit 3a750aa into SciML:master Sep 29, 2023
7 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants