Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gradients #43

Open
15 of 18 tasks
mfalt opened this issue Oct 20, 2017 · 9 comments
Open
15 of 18 tasks

Gradients #43

mfalt opened this issue Oct 20, 2017 · 9 comments

Comments

@mfalt
Copy link
Collaborator

mfalt commented Oct 20, 2017

We seem to be missing most of the implementations of gradient/gradient!. Was this intentional or should I start adding them?
Should we define gradients even for functions that are not differentiable everywhere by returning one of the subgradients?
For example NormL1 could return simply 0 vector at 0.

It would be nice with a short command like
∇(f,x) = gradient(f,x) and
∇!(y,f,x) = gradient!(y,f,x)
any objections to me adding that @lostella ?
One could also use ∂(f,x) to signify that its part of the subdifferential, but since we would only return a set, and not a point, that could be confusing as well.

Completed gradients (in PRs) out of non-Indicator functions that are either: convex (i.e. suggradients exist) or differentiable

  • ElasticNet
  • NormL1
  • NormL2
  • NormL21
  • NormLinf
  • NuclearNorm
  • SqrNormL2 (existed previously)
  • HingeLoss (Convex!?)
  • HuberLoss (existed previously)
  • LeastSquares (existed previously)
  • Linear (fixed)
  • LogBarrier
  • Maximum (Using sum largest, convex?, differentiable?)
  • Quadratic (existed previously)
  • QuadraticIterative (existed previously)
  • SumPositive
  • DistL2
  • SqrDisrL2
@lostella
Copy link
Member

Hey @mfalt: all the gradient! operations are there, just lazily implemented :-p

Jokes apart, you can surely implement the missing ones if you want, and I also completely agree with returning one subgradient for nondifferentiable functions (why not?). We should coherently update the docs in this case.

As per the shortcut: sure, I'm not a fan of using weird unicode characters in code, but that's a non-breaking additional feature.

@mfalt
Copy link
Collaborator Author

mfalt commented Oct 20, 2017

Great, I'll start implementing ASAP, I wanted to do a presentation for some colleagues, and realized that forward-backward was not as straight forward (no pun intended) to implement as I had hoped because of the missing gradients.
Btw, have you seen my gitter message @lostella ?

@lostella
Copy link
Member

No I haven't opened it in a while, I'll go look at it asap

@panpat
Copy link

panpat commented Oct 20, 2017

Guys, instead of reinventing the wheel what about letting JuliaDif compute the gradients using AD?

@lostella
Copy link
Member

True, I’m wondering what would that yield in case of nonsmooth functions. It is worth checking for sure.

@panpat
Copy link

panpat commented Oct 22, 2017

Another thing that is worth checking out:

https://github.com/malmaud/TensorFlow.jl

@lostella
Copy link
Member

There are some problems which I found using ReverseDiff.jl, in particular using ReverseDiff.gradient! for computing gradients. The problems already show up with very simple functions: take the LeastSquaresDirect function, whose evaluation is implemented as

function (f::LeastSquaresDirect{R, RC, M, V, F})(x::AbstractVector{RC}) where {R, RC, M, V, F}
  A_mul_B!(f.res, f.A, x)
  f.res .-= f.b
  return (f.lambda/2)*vecnorm(f.res, 2)^2
end
  1. ReverseDiff.jl does not like the argument typing x::AbstractVector{RC}. This is minor, however after fixing it:
  2. The gradient is not computed correctly: I believe this is because of how the residual is computed with A_mul_B!.
  3. If one replaces this implementation with the simplest f_alt(x) = lambda/2*vecnorm(A*x-b)^2, the gradient is of course computed correctly, but with quite some overhead wrt our current implementation of gradient!. The following happens when A is 200-by-50:
julia> @time gradient!(y1, f, x);
  0.000058 seconds (5 allocations: 176 bytes)
julia> @time ReverseDiff.gradient!(y2, f_alt, x);
  0.001477 seconds (4.88 k allocations: 319.266 KiB)
julia> norm(y1 - y2)
0.0

This is probably due to the fact that the "tape", which is recorded during the forward pass by ReverseDiff.jl, could be cached after the first evaluation. However even if this was fixed
4. Complex arguments are not supported.

The above problems are related to the limitations of ReverseDiff.jl listed here.

I don't know if other packages in the JuliaDiff suite could be more helpful, there's a few others (ReverseDiffSource.jl for example?), I'll look into them and see.

@lostella
Copy link
Member

lostella commented Aug 15, 2021

With a significantly improved Julia AD landscape nowadays, I’m wondering whether it makes sense to keep all these implementations around. For many functions (and proximal operators) the existing AD systems should be able to compute gradients (resp. Jacobians) fine. If there are exceptions, we could hook into ChainRulesCore to inject the correct (or more convenient) derivation rules in AD systems that support it

@nantonel
Copy link
Collaborator

Yes I agree, I think gradient is out of the scope for ProximalOperators. ChainRulesCore seems the right candidate to me! However removing these will need some changes in ProximalAlgorithms I think...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants