Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add implementation of S-FISTA #48

Merged
merged 16 commits into from
Sep 24, 2021
Merged

Conversation

wwkong
Copy link
Contributor

@wwkong wwkong commented Sep 19, 2021

Hello!

I saw in your description that you were open to contributions, and I would like to add one of my proximal gradient algorithms (AIPP) to the list. The algorithm itself relies on a specialized implementation of the classic FISTA method by Beck & Teboulle, so this PR is to have this FISTA variant added to the repo.

Copy link
Member

@lostella lostella left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @wwkong thanks for contributing this! I have some general comments, before I look into the algorithm.

One question on the algorithm: I’m assuming this reduces to standard FISTA when mu=0 (which appears to be the default). How do the iterations differ otherwise? Is it just about the sequence of extrapolation parameters? There are better choices when the strong convexity parameter is known, see eg https://www.seas.ucla.edu/~vandenbe/236C/lectures/fgrad.pdf I’m not sure whether this is related (haven’t read your paper).

The reason I’m asking is, I’m thinking of significantly improving the fast gradient method which is already implemented in the package, so that one can inject whatever extrapolation sequence they want. So I’d like to make sure there’s not too much duplication :-)

Project.toml Outdated Show resolved Hide resolved
Project.toml Outdated Show resolved Hide resolved
@wwkong
Copy link
Contributor Author

wwkong commented Sep 20, 2021

One question on the algorithm: I’m assuming this reduces to standard FISTA when mu=0 (which appears to be the default).

Indeed. A good reference to see why this is true is in one of my more recent preprints [1, Section 3].

How do the iterations differ otherwise? Is it just about the sequence of extrapolation parameters?

For the case of mu > 0, this algorithm is one of the (many!) generalizations of FISTA for the strongly convex case (and not just because of how the extrapolation is done). Note that the scheme itself is based on Nesterov's 3-sequence fast gradient method [2, Eq. (2.2.7)], which itself is based on his fairly complicated theory of estimating functions [2, Section 2.2.1]. Some references that might help in understanding his theory are [1,3,4].

There are better choices when the strong con exits parameter is known, see eg https://www.seas.ucla.edu/~vandenbe/236C/lectures/fgrad.pdf I’m not sure whether this is related (haven’t read your paper).

I am familiar with the strongly convex method mentioned in Vandenberghe's notes above. If I recall correctly, that method is a special instance of one of Nesterov's fast gradient methods [2, Eq. (2.2.22)] which only works if mu > 0, i.e., it fails to converge if mu=0. My method, on the other hand, converges no matter what mu the user may choose (due to being based on Nesterov's more general framework).

The reason I’m asking is, I’m thinking of significantly improving the fast gradient method which is already implemented in the package, so that one can inject whatever extrapolation sequence they want. So I’d like to make sure there’s not too much duplication :-)

Gotcha. Gotcha. For some more added info, I implemented the more general framework of Nesterov because my nonconvex proximal gradient method does not work without several of the auxiliary iterate sequences. There are, of course, many ways to simplify this framework (set mu=0, eliminate a variable, etc.) into one of the more popular (and shorter) versions of FISTA like you currently have in your code.

[1] Kong, W., Melo, J. G., & Monteiro, R. D. (2021). FISTA and Extensions--Review and New Insights. arXiv preprint arXiv:2107.01267.

[2] Nesterov, Y. (2018). Lectures on convex optimization (Vol. 137). Berlin, Germany: Springer International Publishing.

[3] Florea, M. I., & Vorobyov, S. A. (2018). An accelerated composite gradient method for large-scale composite objective problems. IEEE Transactions on Signal Processing, 67(2), 444-459.

[4] Monteiro, R. D., Ortiz, C., & Svaiter, B. F. (2016). An adaptive accelerated first-order method for convex optimization. Computational Optimization and Applications, 64(1), 31-73.

@codecov
Copy link

codecov bot commented Sep 20, 2021

Codecov Report

Merging #48 (69b2fca) into master (2f5d17d) will decrease coverage by 0.05%.
The diff coverage is 86.84%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #48      +/-   ##
==========================================
- Coverage   87.96%   87.91%   -0.06%     
==========================================
  Files          18       19       +1     
  Lines         748      786      +38     
==========================================
+ Hits          658      691      +33     
- Misses         90       95       +5     
Impacted Files Coverage Δ
src/ProximalAlgorithms.jl 100.00% <ø> (ø)
src/algorithms/fista.jl 86.84% <86.84%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2f5d17d...69b2fca. Read the comment docs.

@lostella
Copy link
Member

@wwkong thanks for the detailed explanation and references!

So the algorithm named S-FISTA in the relevant reference, right? Maybe it makes sense to use SFISTA in naming types and functions then? So that it is clear which generalization of FISTA this is (it can be made clear by looking into the docstring and references there, but the name should give I direct clue).

By the way, I merged some fixes to the benchmark script that should make it work in this PR too

@wwkong
Copy link
Contributor Author

wwkong commented Sep 20, 2021

@wwkong thanks for the detailed explanation and references!

So the algorithm named S-FISTA in the relevant reference, right? Maybe it makes sense to use SFISTA in naming types and functions then? So that it is clear which generalization of FISTA this is (it can be made clear by looking into the docstring and references there, but the name should give I direct clue).

That's a good point. I'll rename the functions in fista.jl using SFISTA as a prefix then, but keep the filename fista.jl in case other FISTA variants could be added in there.

By the way, I merged some fixes to the benchmark script that should make it work in this PR too

Cool. I'll do a rebase in the next push.

EDIT: The latest push should have the renames and rebase.

src/algorithms/fista.jl Outdated Show resolved Hide resolved
@lostella lostella changed the title Add an implementation of a strongly convex FISTA method + various dependencies Add implementation of S-FISTA Sep 20, 2021
Copy link
Member

@lostella lostella left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @wwkong I have some final comments/questions. Once these few last things are clear, this can be merged!

src/algorithms/fista.jl Outdated Show resolved Hide resolved
src/algorithms/fista.jl Outdated Show resolved Hide resolved
src/algorithms/fista.jl Outdated Show resolved Hide resolved
src/algorithms/fista.jl Outdated Show resolved Hide resolved
test/problems/test_lasso_small.jl Show resolved Hide resolved
test/problems/test_lasso_small_strongly_convex.jl Outdated Show resolved Hide resolved
test/problems/test_lasso_small_strongly_convex.jl Outdated Show resolved Hide resolved
test/problems/test_lasso_small_strongly_convex.jl Outdated Show resolved Hide resolved
test/problems/test_lasso_small_strongly_convex.jl Outdated Show resolved Hide resolved
test/problems/test_lasso_small_strongly_convex.jl Outdated Show resolved Hide resolved
test/problems/test_lasso_small_strongly_convex.jl Outdated Show resolved Hide resolved
src/algorithms/fista.jl Outdated Show resolved Hide resolved
Copy link
Member

@lostella lostella left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks @wwkong!

@lostella lostella merged commit cf6f7e9 into JuliaFirstOrder:master Sep 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants