-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add local tracers #56
Comments
See #57 |
Reopening since #59 didn't add local tracers. |
Operators with local behavior:
|
Once we have primal values, we will also need to add new operators, mostly those that work with
|
In a potential NNLib extension, we could add global and local classification of activation functions. Here are the ones with local sparsity:
|
Oh and of course we need to add function Base.ifelse(b::Bool, xt::GlobalTracer, yt::GlobalTracer)
return GlobalTracer(union(xt.inputs, yt.inputs))
end
function Base.ifelse(b::Bool, xt::LocalTracer, yt::LocalTracer)
return ifelse(b, xt, yt)
end |
Do we need to reintroduce local sparsity when we:
My answer would be yes, but only if the zero in question is a constant (not a variable which we are also tracing). Then the question becomes: do we ever encounter non-traced numbers during tracing? It seems unlikely, since we're converting everything to tracers. |
The current structure of our code would make it trivial to support the zero-product property and the additive identity. And we could generally return empty tracers if the primal computation returns zero.
What do you mean by this? |
I'm worried we need to be extra careful when the zeros are not fixed but variables. For instance, we don't want to return an empty tracer for
even though there is a multiplication by zero occurring in the primal computation |
An argument could be made that the "global" Jacobian of |
In other words, the local tracer could compute instead of |
This is already what the theory means, no need for the |
What about f(x) = x[1] * x[2]
jacobian_sparsity(f, [0.0, 1.0]) In the old world, we trace both inputs, and the output gets a dependency on both. In the new world, we would only trace the second input? The first one gets an empty tracer from the start? |
Basically, our global tracers |
Right, what I'm suggesting is to give an estimate of the sparsity of |
Right but if we give every zero value an empty tracer, that boils down to saying "when the primal value is zero, every coefficient of the gradient / Hessian that involves the corresponding variable evaluates to zero at the current point". Which I am almost certain is wrong |
I would steer clear of "practical speedups" that are not first validated by theory |
f(x) = exp(x[1]) * x[2]
jacobian_sparsity(f, [0.0, 1.0]) If we give an empty tracer to the first index, we get something wrong regardless of what we do next |
Essentially, even when we have zeros in our computational graph, we can't afford to forget where they come from. Zeros are only special at the end of the graph |
I think you're misreading my argument. You said:
My point is that in this case, as written explicitly above, In the case of
|
Yeah that was more of an answer to your comment on Slack, which we now both agree is wrong. I think your general idea however is right, and it only requires being more precise with the derivatives of the operators For instance, derivatives of |
And what we really don't want to screw up is operators with more than one valid derivative, aka points of nondifferentiability. For instance, |
I suggest we just look at The corresponding Jacobian is Theory tells us |
You mean gradient. And yes, this theory would be implemented by the following functions nonzero_der1_arg1(*, a, b) = !iszero(b)
nonzero_der1_arg2(*, a, b) = !iszero(a) |
I think we should add tests for these finer internal functions with ForwardDiff, instead of testing only the global classification of operators. They are very easy to screw up without realizing. |
Opened #66 to track this. |
As a friendly reminder to myself tomorrow: we should also test on @Vaibhavdixit02's matrix exponentials. |
Currently, all tracers return conservative global sparsity patterns.
Local sparsity can be computed by propagating a primal value as well.
The text was updated successfully, but these errors were encountered: