-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gradients #43
Comments
Hey @mfalt: all the Jokes apart, you can surely implement the missing ones if you want, and I also completely agree with returning one subgradient for nondifferentiable functions (why not?). We should coherently update the docs in this case. As per the shortcut: sure, I'm not a fan of using weird unicode characters in code, but that's a non-breaking additional feature. |
Great, I'll start implementing ASAP, I wanted to do a presentation for some colleagues, and realized that forward-backward was not as straight forward (no pun intended) to implement as I had hoped because of the missing gradients. |
No I haven't opened it in a while, I'll go look at it asap |
Guys, instead of reinventing the wheel what about letting JuliaDif compute the gradients using AD? |
True, I’m wondering what would that yield in case of nonsmooth functions. It is worth checking for sure. |
Another thing that is worth checking out: |
There are some problems which I found using ReverseDiff.jl, in particular using
This is probably due to the fact that the "tape", which is recorded during the forward pass by ReverseDiff.jl, could be cached after the first evaluation. However even if this was fixed The above problems are related to the limitations of ReverseDiff.jl listed here. I don't know if other packages in the JuliaDiff suite could be more helpful, there's a few others (ReverseDiffSource.jl for example?), I'll look into them and see. |
With a significantly improved Julia AD landscape nowadays, I’m wondering whether it makes sense to keep all these implementations around. For many functions (and proximal operators) the existing AD systems should be able to compute gradients (resp. Jacobians) fine. If there are exceptions, we could hook into ChainRulesCore to inject the correct (or more convenient) derivation rules in AD systems that support it |
Yes I agree, I think |
We seem to be missing most of the implementations of
gradient
/gradient!
. Was this intentional or should I start adding them?Should we define gradients even for functions that are not differentiable everywhere by returning one of the subgradients?
For example
NormL1
could return simply0
vector at0
.It would be nice with a short command like
∇(f,x) = gradient(f,x)
and∇!(y,f,x) = gradient!(y,f,x)
any objections to me adding that @lostella ?
One could also use
∂(f,x)
to signify that its part of the subdifferential, but since we would only return a set, and not a point, that could be confusing as well.Completed gradients (in PRs) out of non-Indicator functions that are either: convex (i.e. suggradients exist) or differentiable
The text was updated successfully, but these errors were encountered: