Skip to content

Commit

Permalink
Improve Linesearch and Quasi Newton allocations (#335)
Browse files Browse the repository at this point in the history
* Resolving #333 point 6, linesearch_backtrack! can now be used and is used in the previous places.
* Fix #333 point 1, 2, 5, and one further memory to use.
* documentation formatting.
* IN theory resolves #333 points 4 and 7, in practice there seems to be a side effect.
* first back to copying eta – then tests from before work again.
* Start changelog entry.
* Adapt Documentation.
* bump dependencies 

---------

Co-authored-by: Mateusz Baran <[email protected]>
  • Loading branch information
kellertuer and mateuszbaran authored Dec 28, 2023
1 parent 0bca9b5 commit aef51e5
Show file tree
Hide file tree
Showing 9 changed files with 432 additions and 259 deletions.
5 changes: 4 additions & 1 deletion Changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,13 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Added

* Introduce `sub_kwargs` and `sub_stopping_criterion` for `trust_regions` as noticed in [#336](https://github.com/JuliaManifolds/Manopt.jl/discussions/336)
* Introduce `sub_kwargs` and `sub_stopping_criterion` for `trust_regions` as noticed in [#336](https://

### Changed

* `WolfePowellLineSearch`, `ArmijoLineSearch` step sizes now allocate less
* `linesearch_backtrack!` is now available
* Quasi Newton Updates can work inplace of a direction vector as well.
* Faster `safe_indices` in L-BFGS.

## [0.4.44] December 12, 2023
Expand Down
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ JuMP = "1.15"
LinearAlgebra = "1.6"
LRUCache = "1.4"
ManifoldDiff = "0.3.8"
Manifolds = "0.9"
Manifolds = "0.9.11"
ManifoldsBase = "0.15"
ManoptExamples = "0.1.4"
Markdown = "1.6"
Expand Down
125 changes: 94 additions & 31 deletions src/plans/quasi_newton_plan.jl
Original file line number Diff line number Diff line change
Expand Up @@ -287,33 +287,63 @@ InverseBroyden(φ::Float64) = InverseBroyden(φ, :constant)
@doc raw"""
QuasiNewtonMatrixDirectionUpdate <: AbstractQuasiNewtonDirectionUpdate
These [`AbstractQuasiNewtonDirectionUpdate`](@ref)s represent any quasi-Newton update rule, where the operator is stored as a matrix. A distinction is made between the update of the approximation of the Hessian, ``H_k \mapsto H_{k+1}``, and the update of the approximation of the Hessian inverse, ``B_k \mapsto B_{k+1}``. For the first case, the coordinates of the search direction ``η_k`` with respect to a basis ``\{b_i\}^{n}_{i=1}`` are determined by solving a linear system of equations, i.e.
The `QuasiNewtonMatrixDirectionUpdate` representa a quasi-Newton update rule,
where the operator is stored as a matrix. A distinction is made between the update of the
approximation of the Hessian, ``H_k \mapsto H_{k+1}``, and the update of the approximation
of the Hessian inverse, ``B_k \mapsto B_{k+1}``.
For the first case, the coordinates of the search direction ``η_k`` with respect to
a basis ``\{b_i\}^{n}_{i=1}`` are determined by solving a linear system of equations
```math
\text{Solve} \quad \hat{η_k} = - H_k \widehat{\operatorname{grad}f(x_k)}
\text{Solve} \quad \hat{η_k} = - H_k \widehat{\operatorname{grad}f(x_k)},
```
where ``H_k`` is the matrix representing the operator with respect to the basis ``\{b_i\}^{n}_{i=1}`` and ``\widehat{\operatorname{grad}f(x_k)}`` represents the coordinates of the gradient of the objective function ``f`` in ``x_k`` with respect to the basis ``\{b_i\}^{n}_{i=1}``.
If a method is chosen where Hessian inverse is approximated, the coordinates of the search direction ``η_k`` with respect to a basis ``\{b_i\}^{n}_{i=1}`` are obtained simply by matrix-vector multiplication, i.e.
where ``H_k`` is the matrix representing the operator with respect to the basis ``\{b_i\}^{n}_{i=1}``
and ``\widehat{\operatorname{grad}f(x_k)}`` represents the coordinates of the gradient of
the objective function ``f`` in ``x_k`` with respect to the basis ``\{b_i\}^{n}_{i=1}``.
If a method is chosen where Hessian inverse is approximated, the coordinates of the search
direction ``η_k`` with respect to a basis ``\{b_i\}^{n}_{i=1}`` are obtained simply by
matrix-vector multiplication
```math
\hat{η_k} = - B_k \widehat{\operatorname{grad}f(x_k)}
\hat{η_k} = - B_k \widehat{\operatorname{grad}f(x_k)},
```
where ``B_k`` is the matrix representing the operator with respect to the basis ``\{b_i\}^{n}_{i=1}`` and ``\widehat{\operatorname{grad}f(x_k)}`` as above. In the end, the search direction ``η_k`` is generated from the coordinates ``\hat{eta_k}`` and the vectors of the basis ``\{b_i\}^{n}_{i=1}`` in both variants.
The [`AbstractQuasiNewtonUpdateRule`](@ref) indicates which quasi-Newton update rule is used. In all of them, the Euclidean update formula is used to generate the matrix ``H_{k+1}`` and ``B_{k+1}``, and the basis ``\{b_i\}^{n}_{i=1}`` is transported into the upcoming tangent space ``T_{x_{k+1}} \mathcal{M}``, preferably with an isometric vector transport, or generated there.
where ``B_k`` is the matrix representing the operator with respect to the basis ``\{b_i\}^{n}_{i=1}``
and ``\widehat{\operatorname{grad}f(x_k)}`` as above. In the end, the search direction ``η_k``
is generated from the coordinates ``\hat{eta_k}`` and the vectors of the basis ``\{b_i\}^{n}_{i=1}``
in both variants.
The [`AbstractQuasiNewtonUpdateRule`](@ref) indicates which quasi-Newton update rule is used.
In all of them, the Euclidean update formula is used to generate the matrix ``H_{k+1}``
and ``B_{k+1}``, and the basis ``\{b_i\}^{n}_{i=1}`` is transported into the upcoming tangent
space ``T_{x_{k+1}} \mathcal{M}``, preferably with an isometric vector transport, or generated there.
# Provided functors
* `(mp::AbstractManoptproblem, st::QuasiNewtonState) -> η` to compute the update direction
* `(η, mp::AbstractManoptproblem, st::QuasiNewtonState) -> η` to compute the update direction inplace of `η`
# Fields
* `update` – a [`AbstractQuasiNewtonUpdateRule`](@ref).
* `basis` the basis.
* `matrix` (`Matrix{Float64}(I, manifold_dimension(M), manifold_dimension(M))`)
* `basis` an `AbstractBasis` to use in the tangent spaces
* `matrix` (`Matrix{Float64}(I, manifold_dimension(M), manifold_dimension(M))`)
the matrix which represents the approximating operator.
* `scale` – (`true) indicates whether the initial matrix (= identity matrix) should be scaled before the first update.
* `vector_transport_method` – (`vector_transport_method`)an `AbstractVectorTransportMethod`
* `scale` (`true) indicates whether the initial matrix (= identity matrix) should be scaled before the first update.
* `update` a [`AbstractQuasiNewtonUpdateRule`](@ref).
* `vector_transport_method` (`vector_transport_method`)an `AbstractVectorTransportMethod`
# Constructor
QuasiNewtonMatrixDirectionUpdate(M::AbstractManifold, update, basis, matrix;
scale=true, vector_transport_method=default_vector_transport_method(M))
QuasiNewtonMatrixDirectionUpdate(
M::AbstractManifold,
update,
basis::B=DefaultOrthonormalBasis(),
m=Matrix{Float64}(I, manifold_dimension(M), manifold_dimension(M));
kwargs...
)
## Keyword Arguments
* `scale`, `vector_transport_method` for the two fields above
Generate the Update rule with defaults from a manifold and the names corresponding to the fields above.
Expand Down Expand Up @@ -362,43 +392,63 @@ function QuasiNewtonMatrixDirectionUpdate(
basis, m, scale, update, vector_transport_method
)
end
function (d::QuasiNewtonMatrixDirectionUpdate)(mp, st)
r = zero_vector(get_manifold(mp), get_iterate(st))
return d(r, mp, st)
end
function (d::QuasiNewtonMatrixDirectionUpdate{T})(
mp, st
r, mp, st
) where {T<:Union{InverseBFGS,InverseDFP,InverseSR1,InverseBroyden}}
M = get_manifold(mp)
p = get_iterate(st)
X = get_gradient(st)
return get_vector(M, p, -d.matrix * get_coordinates(M, p, X, d.basis), d.basis)
get_vector!(M, r, p, -d.matrix * get_coordinates(M, p, X, d.basis), d.basis)
return r
end
function (d::QuasiNewtonMatrixDirectionUpdate{T})(
mp, st
r, mp, st
) where {T<:Union{BFGS,DFP,SR1,Broyden}}
M = get_manifold(mp)
p = get_iterate(st)
X = get_gradient(st)
return get_vector(M, p, -d.matrix \ get_coordinates(M, p, X, d.basis), d.basis)
get_vector!(M, r, p, -d.matrix \ get_coordinates(M, p, X, d.basis), d.basis)
return r
end
@doc raw"""
QuasiNewtonLimitedMemoryDirectionUpdate <: AbstractQuasiNewtonDirectionUpdate
This [`AbstractQuasiNewtonDirectionUpdate`](@ref) represents the limited-memory Riemannian BFGS update, where the approximating operator is represented by ``m`` stored pairs of tangent vectors ``\{ \widetilde{s}_i, \widetilde{y}_i\}_{i=k-m}^{k-1}`` in the ``k``-th iteration.
For the calculation of the search direction ``η_k``, the generalisation of the two-loop recursion is used (see [Huang, Gallican, Absil, SIAM J. Optim., 2015](@cite HuangGallivanAbsil:2015)), since it only requires inner products and linear combinations of tangent vectors in ``T_{x_k} \mathcal{M}``. For that the stored pairs of tangent vectors ``\{ \widetilde{s}_i, \widetilde{y}_i\}_{i=k-m}^{k-1}``, the gradient ``\operatorname{grad}f(x_k)`` of the objective function ``f`` in ``x_k`` and the positive definite self-adjoint operator
This [`AbstractQuasiNewtonDirectionUpdate`](@ref) represents the limited-memory Riemannian BFGS update,
where the approximating operator is represented by ``m`` stored pairs of tangent
vectors ``\{ \widetilde{s}_i, \widetilde{y}_i\}_{i=k-m}^{k-1}`` in the ``k``-th iteration.
For the calculation of the search direction ``η_k``, the generalisation of the two-loop recursion
is used (see [Huang, Gallican, Absil, SIAM J. Optim., 2015](@cite HuangGallivanAbsil:2015)),
since it only requires inner products and linear combinations of tangent vectors in ``T_{x_k} \mathcal{M}``.
For that the stored pairs of tangent vectors ``\{ \widetilde{s}_i, \widetilde{y}_i\}_{i=k-m}^{k-1}``,
the gradient ``\operatorname{grad}f(x_k)`` of the objective function ``f`` in ``x_k``
and the positive definite self-adjoint operator
```math
\mathcal{B}^{(0)}_k[⋅] = \frac{g_{x_k}(s_{k-1}, y_{k-1})}{g_{x_k}(y_{k-1}, y_{k-1})} \; \mathrm{id}_{T_{x_k} \mathcal{M}}[⋅]
```
are used. The two-loop recursion can be understood as that the [`InverseBFGS`](@ref) update is executed ``m`` times in a row on ``\mathcal{B}^{(0)}_k[⋅]`` using the tangent vectors ``\{ \widetilde{s}_i, \widetilde{y}_i\}_{i=k-m}^{k-1}``, and in the same time the resulting operator ``\mathcal{B}^{LRBFGS}_k [⋅]`` is directly applied on ``\operatorname{grad}f(x_k)``.
are used. The two-loop recursion can be understood as that the [`InverseBFGS`](@ref) update
is executed ``m`` times in a row on ``\mathcal{B}^{(0)}_k[⋅]`` using the tangent vectors ``\{ \widetilde{s}_i, \widetilde{y}_i\}_{i=k-m}^{k-1}``, and in the same time the resulting operator ``\mathcal{B}^{LRBFGS}_k [⋅]`` is directly applied on ``\operatorname{grad}f(x_k)``.
When updating there are two cases: if there is still free memory, i.e. ``k < m``, the previously stored vector pairs ``\{ \widetilde{s}_i, \widetilde{y}_i\}_{i=k-m}^{k-1}`` have to be transported into the upcoming tangent space ``T_{x_{k+1}} \mathcal{M}``; if there is no free memory, the oldest pair ``\{ \widetilde{s}_{k−m}, \widetilde{y}_{k−m}\}`` has to be discarded and then all the remaining vector pairs ``\{ \widetilde{s}_i, \widetilde{y}_i\}_{i=k-m+1}^{k-1}`` are transported into the tangent space ``T_{x_{k+1}} \mathcal{M}``. After that we calculate and store ``s_k = \widetilde{s}_k = T^{S}_{x_k, α_k η_k}(α_k η_k)`` and ``y_k = \widetilde{y}_k``. This process ensures that new information about the objective function is always included and the old, probably no longer relevant, information is discarded.
# Provided functors
* `(mp::AbstractManoptproblem, st::QuasiNewtonState) -> η` to compute the update direction
* `(η, mp::AbstractManoptproblem, st::QuasiNewtonState) -> η` to compute the update direction inplace of `η`
# Fields
* `memory_s` – the set of the stored (and transported) search directions times step size ``\{ \widetilde{s}_i\}_{i=k-m}^{k-1}``.
* `memory_y` – set of the stored gradient differences ``\{ \widetilde{y}_i\}_{i=k-m}^{k-1}``.
* `ξ` – a variable used in the two-loop recursion.
* `ρ` – a variable used in the two-loop recursion.
* `scale` –
* `vector_transport_method` – a `AbstractVectorTransportMethod`
* `message` – a string containing a potential warning that might have appeared
* `memory_s` the set of the stored (and transported) search directions times step size ``\{ \widetilde{s}_i\}_{i=k-m}^{k-1}``.
* `memory_y` set of the stored gradient differences ``\{ \widetilde{y}_i\}_{i=k-m}^{k-1}``.
* `ξ` a variable used in the two-loop recursion.
* `ρ` a variable used in the two-loop recursion.
* `scale` initial scaling of the Hessian
* `vector_transport_method` a `AbstractVectorTransportMethod`
* `message` a string containing a potential warning that might have appeared
# Constructor
QuasiNewtonLimitedMemoryDirectionUpdate(
Expand All @@ -409,7 +459,7 @@ When updating there are two cases: if there is still free memory, i.e. ``k < m``
initial_vector=zero_vector(M,x),
scale=1.0
project=true
)
)
# See also
Expand Down Expand Up @@ -468,12 +518,19 @@ function status_summary(d::QuasiNewtonLimitedMemoryDirectionUpdate{T}) where {T}
return s
end
function (d::QuasiNewtonLimitedMemoryDirectionUpdate{InverseBFGS})(mp, st)
r = zero_vector(get_manifold(mp), get_iterate(st))
return d(r, mp, st)
end
function (d::QuasiNewtonLimitedMemoryDirectionUpdate{InverseBFGS})(r, mp, st)
isempty(d.message) || (d.message = "") # reset message
M = get_manifold(mp)
p = get_iterate(st)
r = copy(M, p, get_gradient(st))
copyto!(M, r, p, get_gradient(st))
m = length(d.memory_s)
m == 0 && return -r
if m == 0
r .*= -1
return r
end
for i in m:-1:1
# what if we divide by zero here? Setting to zero ignores this in the next step
# precompute in case inner is expensive
Expand Down Expand Up @@ -540,6 +597,11 @@ If [`InverseBFGS`](@ref) or [`InverseBFGS`](@ref) is chosen as update, then the
method follows the method of [Huang, Absil, Gallivan, SIAM J. Optim., 2018](@cite HuangAbsilGallivan:2018), taking into account that
the corresponding step size is chosen.
# Provided functors
* `(mp::AbstractManoptproblem, st::QuasiNewtonState) -> η` to compute the update direction
* `(η, mp::AbstractManoptproblem, st::QuasiNewtonState) -> η` to compute the update direction inplace of `η`
# Fields
* `update` – an [`AbstractQuasiNewtonDirectionUpdate`](@ref)
Expand Down Expand Up @@ -572,6 +634,7 @@ function QuasiNewtonCautiousDirectionUpdate(
return QuasiNewtonCautiousDirectionUpdate{U}(update, θ)
end
(d::QuasiNewtonCautiousDirectionUpdate)(mp, st) = d.update(mp, st)
(d::QuasiNewtonCautiousDirectionUpdate)(r, mp, st) = d.update(r, mp, st)

# access the inner vector transport method
function get_update_vector_transport(u::AbstractQuasiNewtonDirectionUpdate)
Expand Down
Loading

0 comments on commit aef51e5

Please sign in to comment.