-
-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use DifferentiationInterface for the jacobian #258
base: master
Are you sure you want to change the base?
Conversation
Hi @ErikQQY, thanks for pinging me!
Could you maybe give me a standalone benchmark MWE, even if it requires checking out the package from this branch?
If you already know the Jacobian structure, you want to use
That is unavoidable, but luckily you should be able to do it outside of the hot loop since it is encapsulated in the preparation step. The mantra of DI is "prepare once, reuse plenty of times". I'll check out your code but I suspect you may not be taking advantage of preparation. |
Review in progress, but I think you may have forgotten to specify a
|
Also I'm not sure why you perform differentiation with |
Benchmark Results
Benchmark PlotsA plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only reviewed the first few files because I think there are a few misunderstandings to clarify before going further
colored_matrix = __generate_sparse_jacobian_prototype( | ||
cache, cache.problem_type, y, y, cache.M, | ||
N, get_dense_ad(jac_alg.nonbc_diffmode)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you use this for?
There is a banded matrix with a known bandsize, and a potentially sparse set of rows on top. The current algorithm gets the color vector by knowing the bandsize analytical solution and does only sparsity on the extra rows. Then it does batched forward mode for the banded matrix and choice for the top (forward, reverse, etc). I think the difficulty here is that the representation of the coloring in DI makes it hard to do this by hand. The question is then how to get this mixed mode automatically or semi-automatically. |
So if I try to rephrase, you have
and you're interested in the Jacobian of Perhaps the easiest way would be to have Note that this would also be a good opportunity to try out |
Let me know if there are any bottlenecks in SCT's performance. We can most likely add additional methods for speed-ups. |
Yes
Now we are using |
I don't understand, is the banded part dense or sparse? |
Sorry, the banded part is dense, the bc part is sparse |
What kind of storage do you use for this stacked Jacobian? Do you store both parts separately? Or do you put them both inside a |
Now we just store both parts in one matrix(it only matters in the multi-points case, since in two-points case the Jacobian is just a banded matrix which doesn’t need this kind of storage), for example the first several rows of our Jacobian are the sparse Jacobian of bc part(sparse because the evaluation of boundary conditions is only on a few points), and the other rows of the Jacobian have a banded structure. We store this special structure as almost banded matrix using FastAlmostBandedMatrices.jl. |
Okay, we'll see how fast decompression is, but there might be some tweaks necessary there |
How are we doing around here @ErikQQY? Need any more guidance? |
@gallery Thanks a lot! Now I first need to ensure the correct syntax for the sparse Jacobian and the refactored solvers don’t break CI. Still have some subpackages to do, but should be ready very soon. After that, some reviews on the improvements and would be nice😄. |
I wonder why this PR ends up adding more lines than it removes? Did we fail with our grand design of DI making life easier for package developers? |
This PR is kind of still working in progress, I haven't removed the sparse Jacobian stuff from SparseDiffTool.jl since the shooting methods exploit this in a special way(but will delete them once correctly configured!), and this is also mixed with some refactors so the git diff may not reflect the real status. |
Okay thanks! But do keep it in mind while you make the changes: if DI is not making your life easier, there is probably something wrong that we can fix 😊 |
The failure for Ascher seems to be caused by the cache object not adapting to the element type of the input. During sparsity detection, SparseConnectivityTracer.jl passes a custom number type (not unlike ForwardDiff.jl), and there is apparently a forced conversion to |
Yeah, I am not quite satisfied with the current implementation of the collocation part of Ascher methods, and I am doing the overhaul these days. But I believe other methods like MIRK, FIRK, MIRKN, and Shooting methods are surely done now |
Any feedback on things we could do in DI to make your life easier? |
It is quite delightful to refactor from the SparseDiffTools.jl to DI, and the refactor does make the code simpler, but I guess DI is still a young package compared to others and I stumbled over several bugs(speaking of which, a new issue coming😅). But on the bright side, encountering problems and reporting them are also a way to help DI grow better🤠. |
Oh there are definitely a lot of bugs and edge cases to hunt down! The present application to BoundaryValueDiffEq is the most demanding one so far, which means your feedback is very valuable. |
@gdalle @adrhill OK, the MIRK, FIRK, Shooting, and MIRKN methods are done now, the other methods need further optimization(need a redesign and don't involve DI, leave it out for now), so could you please give some advice on the current refactorization of changing SparseDiffTools.jl to DI? You can only focus on the changes in files: mirk.jl, firk.jl, mirkn.jl, single_shooting.jl, multiple_shooting.jl and their corresponding sparse_jacobians.jl where the matrix coloring structure is predetermined. Thanks in advance!!! |
I have some big picture feedback that might be more relevant to the design of ADTypes.jl: It looks like you are internally overwriting the On the other hand, I understand that your choice here is the right one and that users most likely always want to use known patterns when available, regardless of what they specify. |
DI.jacobian!( | ||
loss_bc, resid_bc, @view(J[1:L, :]), bc_diffcache, bc_diffmode, x, Constant(p)) | ||
DI.jacobian!(loss_collocation, resid_collocation, @view(J[(L + 1):end, :]), | ||
nonbc_diffcache, nonbc_diffmode, x, Constant(p)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that if you ever work with SparseMatrixCSC
formats (I think it is not the case here), views of sparse matrices are very inefficient
AutoSparse(get_dense_ad(jac_alg.nonbc_diffmode); | ||
sparsity_detector = ADTypes.KnownJacobianSparsityDetector(colored_result.A), | ||
coloring_algorithm = ConstantColoringAlgorithm{ifelse( | ||
ADTypes.mode(jac_alg.nonbc_diffmode) isa ADTypes.ReverseMode, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that there are other modes than ForwardMode
and ReverseMode
: ADTypes.jl also defines SymbolicMode
and ForwardOrReverseMode
. I don't know whether that is relevant here.
# for underdetermined systems we don't have banded qr implemented. use sparse | ||
J₁ < J₂ && return ColoredMatrix(sparse(J), matrix_colors(J'), matrix_colors(J)) | ||
return ColoredMatrix(J, matrix_colors(J'), matrix_colors(J)) | ||
J₁ < J₂ && return coloring(sparse(J), problem, algo) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function coloring
converts the matrix to SparseMatrixCSC
anyway, so the sparse
here is superfluous unless you need to recover a SparseMatrixCSC
when you do sparsity_pattern(coloring_result)
(aka coloring_result.A
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, we do need to recover a SparseMatrixCSC
for the KnownJacobianSparsityDetector
, so I think the sparse
here is fine
jac_prototype = init_jacobian(jac_cache) | ||
diffmode = if alg.jac_alg.diffmode isa AutoSparse | ||
AutoSparse(get_dense_ad(alg.jac_alg.diffmode); | ||
# Local because iszero require primal values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means that the sparsity pattern will not necessarily generalize across input points. To increase robustness, you probably want to evaluate it at x = rand(n)
rather than, say, x = zeros(n)
.
ensemblealg, prob, jac_cache, ::DiffCacheNeeded, | ||
diffmode, alg, nshoots, u; kwargs...) | ||
if diffmode isa AutoSparse | ||
xduals = reshape(jac_cache.pushforward_prep.xdual_tmp[1:length(u)], size(u)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is private API of DifferentiationInterface, it can change at any time without notice, so you shouldn't touch it. The details of preparation objects are not public-facing.
Why do you need to manipulate dual numbers directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Shooting methods, the loss function is the evaluation of an ODE integrator in boundary conditions, for example, after initializing an ODE problem, we would have an ODE integrator, and we use Newton methods to find the point where the residual satisfies our tolerance, in this process, we have to re-initialize the integrator(re-initializing points are iteration points) until Newton methods achieve convergence. OK, this procedure is basically the shooting methods, but we want to use AD to utilize the Jacobian, so the answer to the Jacobian of this kind of loss function is to build another integrator whose u
is Dual number, and in the process of re-initialization, we need Dual number iteration points. Here is an example illustrating how shooting methods work:
using OrdinaryDiffEq, SciMLBase, ForwardDiff, DifferentiationInterface, ADTypes
const DI = DifferentiationInterface
backend = AutoForwardDiff()
function lorenz!(du, u, p, t)
du[1] = 10.0(u[2] - u[1])
du[2] = u[1] * (28.0 - u[3]) - u[2]
du[3] = u[1] * u[2] - (8 / 3) * u[3]
end
u1 = [1.0; 0.0; 0.0]
tspan = (0.0, 100.0)
prob = ODEProblem(lorenz!, u1, tspan)
ode_cache = SciMLBase.__init(prob, Tsit5())
function loss(du, u, p)
SciMLBase.reinit!(ode_cache, u)
sol = solve!(ode_cache)
du .= sol[end]
end
u1 = [2.0; 2.0; 3.0]
jac_cache = DI.prepare_jacobian(nothing, similar(u1), backend, u1)#ForwardDiff.JacobianConfig(nothing, u1, Chunk(2))
xduals = jac_cache.config.duals[2]
fill!(xduals, 0)
ode_cache_jac = SciMLBase.__init(remake(prob, u0 = xduals, tspan = eltype(xduals).(prob.tspan)), Tsit5())
function loss_jac(du, u)
SciMLBase.reinit!(ode_cache_jac, u)
sol = solve!(ode_cache_jac)
du .= sol[end]
end
jac(J, y, p) = DI.jacobian!(loss_jac, y, J, jac_cache, backend, y)
A = zeros(Float64, 3, 3)
b = [1.0, 2.0, 3.0]
jac(A, b, nothing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I understand the issue. What would be the right abstraction to add to DI here? Something like DI.overloaded_inputs
for operator-overloading backends?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't fully understand that adding operator-overloading in DI has something to do here, could you please elaborate on this? The Dual numbers are required for the initialization of the loss function so that we can build Jacobian no matter whether we are using AutoForwardDiff
or AutoSparse(AutoForwardDiff)
, that's why we use jac_cache.config.duals
and jac_cache.pushforward_prep.xdual_tmp
to obtain the Dual type and enable AD. So I wonder if it's possible to have an easy way to get the Dual number in Jacobian preparation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand, and I'm looking for a way to put this in DI's API so that it becomes more than a ForwardDiff-specific functionality. For most AD backends based on operator overloading, the functions are called on a different input type (like Dual), so I thought returning that input type could make sense in general
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, returning the input type could be so much more convenient to build hacky AD-compatible functions(at least in our ForwardDiff use case :P), that should be easy to implement for operator overloading AD backends, I can add that to DI, and I am thinking we can just provide a simple function like DI.overloaded_inputs
you previously proposed to do this.
if diffmode isa AutoSparse | ||
xduals = reshape(jac_cache.pushforward_prep.xdual_tmp[1:length(u)], size(u)) | ||
elseif diffmode isa AutoForwardDiff | ||
xduals = reshape(jac_cache.config.duals[2][1:length(u)], size(u)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here except that it is private API of ForwardDiff (less likely to break though)
What do you mean by that? Packages are already free to subtype |
Maybe users should be provided the choice between:
Of course, every package could their own version of Footnotes |
Fix: #218
MIRK, FIRK, MIRKN, and Ascher methods are using DifferentiationInterface.jl to support sparse Jacobian exploitation, shooting methods are working in progress.
Changing SparseDiffTools.jl to DifferentiationInterfce.jl doesn't cost much pain, but the performance drops comparing with the current master branch, I suspect this maybe caused by some inappropriate matrix coloring and sparsity detection usage of DifferentiationInterface.jl and SparseMatrixColorings.jl in this implementation, for example, the
jac_prototype
only need the basic structure of jacobian, but didn't find similar functionalities in DI.jl, the construction ofColoringProblem
and**Algorithm
in SparseMatrixColorings.jl introduce allocations, so cc Guillaume for some advice.