-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add implementation of PANOC+ #57
Conversation
Codecov Report
@@ Coverage Diff @@
## master #57 +/- ##
==========================================
+ Coverage 89.03% 89.76% +0.73%
==========================================
Files 20 21 +1
Lines 857 938 +81
==========================================
+ Hits 763 842 +79
- Misses 94 96 +2
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the work here! This looks pretty good, I have left some comments in the code.
It’s interesting that the code is overall shorter than the original implementation, I guess this is because no special treatment for quadratic f is there (which makes sense given the focus on non-Lipschitz differentiable cases).
Some tests failed but unrelated to this change (we should get rid of some randomized tests, or make assertions more permissive) so you can disregard them.
README.md
Outdated
@@ -31,6 +31,7 @@ Vũ-Condat primal-dual algorithm[^chambolle_2011][^vu_2013][^condat_2013] | [`Vu | |||
Davis-Yin splitting[^davis_2017] | [`DavisYin`](src/algorithms/davis_yin.jl) | |||
Asymmetric forward-backward-adjoint splitting[^latafat_2017] | [`AFBA`](src/algorithms/primal_dual.jl) | |||
PANOC (L-BFGS)[^stella_2017] | [`PANOC`](src/algorithms/panoc.jl) | |||
PANOC+ (L-BFGS)[^demarchi_2021] | [`NOLIP`](src/algorithms/nolip.jl) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: should the name be NOLIP here? Although I agree it would not match the paper… I’m anyway preparing more extensive documentation to replace this README, where a clearer pairing (literature, implementation) is there, so I can also take care of it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather change NOLIP to, say, PANOCplus, as it is a variant of PANOC. But I don't know your naming conventions or Julia constraints. What do you suggest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that would be clearer: let’s go for PANOCplus
(and …Iteration
and …State
)
src/algorithms/nolip.jl
Outdated
if (iter.gamma === nothing || iter.adaptive == true) | ||
mul!(state.Az, iter.A, state.z) | ||
f_Az = gradient!(state.grad_f_Az, iter.f, state.Az) | ||
tol = 10 * eps(R) * (1 + abs(f_Az)) | ||
if f_Az > f_Az_upp + tol && state.gamma >= iter.minimum_gamma | ||
state.gamma *= 0.5 | ||
if state.gamma < iter.minimum_gamma | ||
@warn "stepsize `gamma` became too small ($(state.gamma))" | ||
end | ||
can_update_direction = true | ||
reset_direction_state!(iter, state) | ||
continue | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this could be done using backtrack_stepsize
just like at the start of the iterations (lines 98-105)? One way to do that would be to let it compute a new local gamma
value, and then compare it to state.gamma
to see if backtracked
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, it would be much better. However, backtrack_stepsize
can potentially take several iterations before returning, whereas here we backtrack only once before restarting.
src/algorithms/nolip.jl
Outdated
gamma=gamma, y=y, z=z, g_z=g_z, res=x-z, H=initialize(iter.directions, x), | ||
) | ||
if (iter.gamma === nothing || iter.adaptive == true) | ||
state.gamma, state.g_z, f_Az, f_Az_upp = backtrack_stepsize!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this initial backtracking is not there in the pseudocode in the paper, do I understand it right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pseudocode gives special treatment to the case k=0
. Considering the first iteration, then, the initial backtracking on gamma
is there indeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True! Didn’t realize it. Partially related to this: do you think the pseudocode in the manuscript could be rearranged so as to avoid “start with step xyz” and just start with step 1? That just struck me as potentially confusing when I first saw it.
src/algorithms/nolip.jl
Outdated
|
||
while true | ||
|
||
if can_update_direction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: this is more of a should_
than a can_
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this gives more emphasis on the fact that d
cannot be updated sometimes. Conversely, as argued in the preprint, the search direction is updated whenever it can_
be, but this is not necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Thanks @aldma!
If I had one wish it would be for an equivalence test between NOLIP and PANOC, in what I believe (correct me if I’m wrong) is the only case they should be expected to work exactly the same: when gamma
or Lf
is given, and consequently adaptive=false
. There are a couple of such tests in https://github.com/JuliaFirstOrder/ProximalAlgorithms.jl/blob/master/test/problems/test_equivalence.jl
Testing for the same iterates should be sufficient, as that indirectly tests that also directions are computed the same.
That would be very useful to have in case anything changes in the implementation of either algorithm, it could help catch any mistake early |
I've added an equivalence test between NOLIP and PANOC and then changed the name from NOLIP to PANOCplus. Thanks for your hints! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
Hello!
First of all, thanks for all the work you put into this repo.
@AndThem and I would like to contribute with a variant of PANOC we recently investigated (arXiv:2112:13000).
The algorithm consists of two entangled backtracking steps: before a tentative update is accepted along some search direction, there is a check on the forward-backward stepsize
gamma
that should satisfy a Lipschitz bound. In contrast with the original PANOC, this refined linesearch procedure allows to easily reject poor steps and to handle smooth termsf
that have only locally Lipschitz continuous gradient.In the preprint we denoted this variant by PANOC+. However, to avoid issues with symbols and names, the algorithm has been implemented as a solver called NOLIP. This PR is to add NOLIP to the repo.