-
Notifications
You must be signed in to change notification settings - Fork 219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AutoDiff #331
Comments
My bad, it seems to be indeed using |
That's right, we are using ForwardDiff. |
In the wiki it is written: What about |
Yes we've looked into this library. I think @xukai92 also did some benchmarking on its performance. The result does suggest backward AD is more efficient than forward AD on big models. Unfortunately this library is currently not compatible with Distributions.jl, and lacks support for many common numerical functions. |
Thanks for your answer. And what about |
ReverseDiff.jl also has a very hard upper bound on ForwardDiff v0.5 which will "break" (or at least pull back) many libraries like Optim.jl, DifferentialEquations.jl, NLsolve.jl, ... i.e. I wouldn't even suggest having it installed anymore since it will cause a ton of problems. AutoGrad has some issues with broadcast-composed functions since it uses dispatch overloading. So I think it's still stuck on v0.5. The true answer will be Cassette.jl when it comes out. I suggest just sticking to ForwardDiff.jl until it does because the current solutions are somewhat fragile. |
In our case, we want to compute the gradient of the joint probability which is a function from the cartesian product of the model random variables' support to R, therefore R^n -> R with n=sum{n_i} (n_i being the dimensionality of the ith rv). Reverse differentiation should be more efficient than forward differentiation from a theoretical point of view, isn't it ? @ChrisRackauckas Do you know whether Jarrett will rewrite ReverseDiff.jl using Cassette.jl, or Cassette.jl itself will allow reverse differentiation ? Would you also have an idea of when Cassette.jl is planned to be in beta ? |
Cassette.jl is just a graph construction algorithm. ReverseDiff.jl will be rewritten to use Cassette.jl. I don't know of any timelines, other than "soon enough". |
Hi @ChrisRackauckas, We want to make some plans for backward autodiff integration. There are also some updates from both the Cassette and AutoGrad packages side. Most notably,
What are your current thoughts/suggestions on this issue? Are there any hard reasons why we can’t merge the AutoGrad package and ReverseDiff package since they overlap so much? Thanks! |
@yebai I was looking into adapting AutoGrad, with some discussions with their developer here: denizyuret/AutoGrad.jl#42 |
In the end they say:
However I haven't managed to make my proposal of subtyping Rec work yet.. |
Cassette is a contextual dispatch system which can make autodifferentiation. Capstan.jl is where that implementation will live.
Autograd requires registered primatives. ReverseDiff using the dispatch mechanism to push derivative information back. They are conceptually different. Given that ReverseDiff.jl won't be going away since the Cassette work is now going to stay in a separate repo, and that ReverseDiff.jl now no longer has any hard upper bounds, I would suggest trying to get ReverseDiff.jl working. It'll go legacy at some point but should continue to work. |
Thanks for the answer! I'm a bit confused about your points on the difference between AutoGrad.jl and ReverseDiff.jl. Do you mean ReverseDiff.jl doesn't need any primitives at all? Another two question on ReverseDiff.jl
|
In theory it supports any
It should be, since it would get the derivatives forward there via ForwardDiff.
It has a set of generic primatives via dual numbers of ForwardDiff.jl, and uses that to compose functions together. AutoGrad is much more strict on the functional form it allows and requires registering a lot of higher level functions. |
Very helpful - thank all for the answers! @ChrisRackauckas what’s the current plan for Capstan, e.g. is there any estimated (even informally) release time? I’ve heard that Julia 1.0 is coming out soon from one Julia creator. Could we help to speed up the development of Capstan? |
It already works, but it's slow as mentioned here JuliaLabs/Cassette.jl#5 due to JuliaLang/julia#5402 . That's planned to be fixed in a Julia v1.x, so if you don't want to wait then just setup ReverseDiff.jl for now. |
Just saw this thread; my comment here might be relevant. Just FYI, I don't really plan on doing much ReverseDiff maintenance in the future, so that I can focus on Cassette/Capstan stuff. Once Capstan is released (or even just in beta), I'd be happy to help Turing.jl put it through its paces. |
There is still a difficulty using ReverseDiff.jl with Distributions.jl - it's not possible to construct a distribution with E.g. I cannot AD through f(x) = logpdf.(Normal(x, 1), 1) This is simply because Distritbuion.jl only support constructing distributions with |
@xukai92 The following seems work for me. I guess the issue is that using ReverseDiff: GradientTape, GradientConfig, gradient, gradient!, compile
using Distributions
f(x) = logpdf.(Normal(x[1], 1), 1) # x is an array
gradient(f, [1.0]) |
Emmm. I see what you the trick. Then it seems that in our Turing.jl program one should write something like this @model test() = begin
x ~ Normal(0, 1)
1 ~ Normal(x[1], 1)
end Need to test it on another computer of mine later - will update when I do that. |
@ChrisRackauckas @jrevels Are there any APIs for passing scalar to target function when using |
No, see JuliaDiff/ReverseDiff.jl#40. You can, of course, always (un)wrap the scalar input e.g. x = rand()
y = rand(4)
f(x::Real, y::AbstractArray) = # some implementation returning a scalar
ReverseDiff.gradient((x, y) -> f(x[1], y), (fill(x), y)) |
@yebai The test below works on my local Turing.jl by simply changing all the types in VarInfo which were using ReverseDiff: GradientTape, GradientConfig, gradient, gradient!, compile
using Turing
@model gdemo(x) = begin
s ~ InverseGamma(2,3)
m ~ Normal(0,sqrt(s[1]))
x[1] ~ Normal(m[1], sqrt(s[1]))
x[2] ~ Normal(m[1], sqrt(s[1]))
return s, m
end
mf = gdemo([1.5, 2.0])
vi = mf()
f_turing(x) = begin
print(typeof(x))
vi.vals[1] = x
mf(vi).logp
end
gradient(f_turing, ([1.0])) |
@jrevels I managed to use ReverseDiff.jl in Turing.jl now (branch=https://github.com/yebai/Turing.jl/tree/reverse-diff) When I integrate it with Turing.jl, I experienced a huge performance change by using a local variable to accumulate log-probability (d641f42) instead of accumulating log-probability using a field of a composite-type variable. To be more specific, when I increase the dimensionality, the time used to get AD increases 100x slower than before. The change I made is something like type LP
lp::Union{Float64,ReverseDiff.TrackedReal}
end
acc!(x::Union{Float64,ReverseDiff.TrackedReal}, d::Distribution, lp::LP) = lp.lp += logpdf(d, x)
lp = LP(0.0)
f3(x) = begin
lp.lp = 0.0
acc!(x[1], Normal(0, 1), lp)
for i = 2:length(x)
acc!(x[i], Normal(x[i-1], 1), lp)
end
lp.lp
end to lp = LP(0.0)
f1(x) = begin
_lp::Union{Float64,ReverseDiff.TrackedReal} = 0.0
_lp += logpdf(Normal(0, 1), x[1])
for i = 2:length(x)
_lp += logpdf(Normal(x[i-1], 1), x[i])
end
lp.lp = _lp
lp.lp
end I didn't understand why that happened and also didn't managed to get a simplest case to show the difference. I wonder how ReverseDiff.jl is sensitive to the following situations?
|
type LP
lp::Union{Float64,ReverseDiff.TrackedReal}
end that's not type stable to access. Use a type parameter type LP{T}
lp::T
end |
Thanks Chirs. Is that true if it's a local variable Julia might be able to infer the type? |
If the code is type-stable, yes. Those type markings you're putting everywhere, except the one I mentioned, are unnecessary for type inference. |
I see. A follow-up question is that how much performance of ReverseDiff.jl's is influenced by un-stable type? Using the example above, I only get ~3x speed-up. But what I experienced in Turing is much larger. Is it the fact that if part of the code is type unstable, all other function calls may be influenced? |
Yes, inference problems "leak". If you don't know the type of |
Hi all !
I was wondering, why did you choose to use chunk-wise forward AD, and not use http://www.juliadiff.org ?
Best,
Emile
The text was updated successfully, but these errors were encountered: