-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add implementation of S-FISTA #48
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @wwkong thanks for contributing this! I have some general comments, before I look into the algorithm.
One question on the algorithm: I’m assuming this reduces to standard FISTA when mu=0 (which appears to be the default). How do the iterations differ otherwise? Is it just about the sequence of extrapolation parameters? There are better choices when the strong convexity parameter is known, see eg https://www.seas.ucla.edu/~vandenbe/236C/lectures/fgrad.pdf I’m not sure whether this is related (haven’t read your paper).
The reason I’m asking is, I’m thinking of significantly improving the fast gradient method which is already implemented in the package, so that one can inject whatever extrapolation sequence they want. So I’d like to make sure there’s not too much duplication :-)
Indeed. A good reference to see why this is true is in one of my more recent preprints [1, Section 3].
For the case of mu > 0, this algorithm is one of the (many!) generalizations of FISTA for the strongly convex case (and not just because of how the extrapolation is done). Note that the scheme itself is based on Nesterov's 3-sequence fast gradient method [2, Eq. (2.2.7)], which itself is based on his fairly complicated theory of estimating functions [2, Section 2.2.1]. Some references that might help in understanding his theory are [1,3,4].
I am familiar with the strongly convex method mentioned in Vandenberghe's notes above. If I recall correctly, that method is a special instance of one of Nesterov's fast gradient methods [2, Eq. (2.2.22)] which only works if mu > 0, i.e., it fails to converge if mu=0. My method, on the other hand, converges no matter what mu the user may choose (due to being based on Nesterov's more general framework).
Gotcha. Gotcha. For some more added info, I implemented the more general framework of Nesterov because my nonconvex proximal gradient method does not work without several of the auxiliary iterate sequences. There are, of course, many ways to simplify this framework (set mu=0, eliminate a variable, etc.) into one of the more popular (and shorter) versions of FISTA like you currently have in your code. [1] Kong, W., Melo, J. G., & Monteiro, R. D. (2021). FISTA and Extensions--Review and New Insights. arXiv preprint arXiv:2107.01267. [2] Nesterov, Y. (2018). Lectures on convex optimization (Vol. 137). Berlin, Germany: Springer International Publishing. [3] Florea, M. I., & Vorobyov, S. A. (2018). An accelerated composite gradient method for large-scale composite objective problems. IEEE Transactions on Signal Processing, 67(2), 444-459. [4] Monteiro, R. D., Ortiz, C., & Svaiter, B. F. (2016). An adaptive accelerated first-order method for convex optimization. Computational Optimization and Applications, 64(1), 31-73. |
Codecov Report
@@ Coverage Diff @@
## master #48 +/- ##
==========================================
- Coverage 87.96% 87.91% -0.06%
==========================================
Files 18 19 +1
Lines 748 786 +38
==========================================
+ Hits 658 691 +33
- Misses 90 95 +5
Continue to review full report at Codecov.
|
@wwkong thanks for the detailed explanation and references! So the algorithm named S-FISTA in the relevant reference, right? Maybe it makes sense to use By the way, I merged some fixes to the benchmark script that should make it work in this PR too |
That's a good point. I'll rename the functions in
Cool. I'll do a rebase in the next push. EDIT: The latest push should have the renames and rebase. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @wwkong I have some final comments/questions. Once these few last things are clear, this can be merged!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thanks @wwkong!
Hello!
I saw in your description that you were open to contributions, and I would like to add one of my proximal gradient algorithms (AIPP) to the list. The algorithm itself relies on a specialized implementation of the classic FISTA method by Beck & Teboulle, so this PR is to have this FISTA variant added to the repo.