You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We should implement some of the popular stochastic gradient optimisation techniques, such as SGD, SGD+momentum, Adagrad, Adadelta and Adam. These methods find local optimum (global when dealing with convex problem) of differentiable objective, see nice surveys in this arXiv preprint and this blog post.
Furthermore, this arXiv preprint suggests a gradient descent where they replace the classical (squared) two norm metric in the gradient descent setting with a generalised Bregman distance, based on a more general proper, convex and lower semi-continuous functional.
The text was updated successfully, but these errors were encountered:
We should implement some of the popular stochastic gradient optimisation techniques, such as SGD, SGD+momentum, Adagrad, Adadelta and Adam. These methods find local optimum (global when dealing with convex problem) of differentiable objective, see nice surveys in this arXiv preprint and this blog post.
Furthermore, this arXiv preprint suggests a gradient descent where they replace the classical (squared) two norm metric in the gradient descent setting with a generalised Bregman distance, based on a more general proper, convex and lower semi-continuous functional.
The text was updated successfully, but these errors were encountered: