-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Parametric Batch Solve Performance #1546
Comments
Yeah, i suspect caching in the residual and |
Jip, flame graph is cool. The way i read it, is that the hot-loop calling of Not sure how that is going to influence the AD story which you ran into last time. |
jip, that is the biggest problem. The type cannot be hardcoded and is actually not consistent in one solve. I can try a sort of operational memory “hot-swop” cache that won’t be type stable, but we can annotate the type. It’s the garbage collector that runs and slows it down. |
Testing different options that work with AD in #Full cache/lazy version (11 allocations)
M = cf.cache.manifold
ϵ0 = cf.cache.ϵ0
q̂ = cf.cache.lazy[q, :q_hat]
exp!(M, q̂, ϵ0, X)
Manifolds.compose!(M, q̂, p, q̂)
X = cf.cache.lazy[q, :X]
log!(M, X, q, q̂)
Xc = IIF.getCoordCache!(cf.cache.lazy, M, number_eltype(q), :Xc)
vee!(M, Xc, q, X)
# no cache version (28 allocations)
M = getManifold(Pose2)
q̂ = Manifolds.compose(M, p, exp(M, identity_element(M, p), X))
Xc = vee(M, q, log(M, q, q̂))
# no cache version shared (19 allocations)
M = getManifold(Pose2)
q̂ = allocate(q)
exp!(M, q̂, identity_element(M, p), X)
Manifolds.compose!(M, q̂, p, q̂)
Xc = vee(M, q, log(M, q, q̂))
# as much as possible cache version (17 allocations)
q̂ = allocate(q)
M = cf.cache.manifold
ϵ0 = cf.cache.ϵ0
exp!(M, q̂, ϵ0, X)
Manifolds.compose!(M, q̂, p, q̂)
Xc = vee(M, q, log(M, q, q̂))
# as much as possible and weird reuse cache version (10 allocations)
q̂ = allocate(q)
M = cf.cache.manifold
ϵ0 = cf.cache.ϵ0
exp!(M, q̂, ϵ0, X)
Manifolds.compose!(M, q̂, p, q̂)
Xc = vee(M, q, log!(M, q̂, q, q̂)) Tested with with N=10
fg = initfg(GraphsDFG;solverParams=SolverParams(algorithms=[:default, :parametric]))
fg.solverParams.graphinit = false
addVariable!(fg, :x0, Pose2)
addFactor!(fg, [:x0], PriorPose2(MvNormal([0.,0,0], [0.1,0.1,0.01])))
dist = MvNormal([1.,0,0], [0.1,0.1,0.01])
for i=1:N
addVariable!(fg, Symbol("x$i"), Pose2)
addFactor!(fg, [Symbol("x$(i-1)"),Symbol("x$i")], Pose2Pose2(dist))
end
IIF.solveGraphParametric(fg)
# solveGraph!(fg) |
I still want to figure out on this if we cannot just move the |
Sorry, I don’t understand? I found this package specifically for the issue: https://github.com/SciML/PreallocationTools.jl |
Ah, perhaps i can ask this way round: Where is the first line of code in a parametric solve where AD is called? I'll go look from there in IIF why the CalcFactor type is not stable or assigned yet. The question i'm trying to figure out is whether the current intended type assignment to the cache object "happens after" or "happens before" the first AD request in parametric solve. And maybe the answer here is just, let's move preambleCache call "earlier" somehow, so that by the time AD happens all the types are set. Or i'm misunderstanding the issue. |
Currently from From link above, this is what is needed:
|
Right ... so i'd expect
oops, i'm not much help here yet. I need to learn on this. I'd suggest you keep going on this track and I'll work on other data upload work. I'll come back here after AMP41 and IIF1010, and since 1010 is also about AD it would be a good time for me to loop back. |
I'm completely refactoring to use a product manifold of power manifolds. I think that way the same data structure can support both Manopt.jl and Optim.jl. |
Yes, Also note, for power manifolds, that Manifolds.jl supports https://github.com/JuliaArrays/HybridArrays.jl . BTW, there are some issues with AD through exp and log on (among other) SO(n), I have it largely solved for ForwardDiff.jl and ChainRules.jl but I think this is something worth discussing. |
xref #1372 |
A rough benchmark of where we are. fg = initfg()
fg = generateGraph_Beehive!(1000; dfg=fg, graphinit=false, locality=0.5);
r=IIF.solveGraphParametric(fg; autodiff=:finite, computeCovariance=false);
# Seconds run: 164 (vs limit 100)
# Iterations: 1
# f(x) calls: 8
# ∇f(x) calls: 8
r=IIF.solveGraphParametric(fg; ... Nelder-Mead
# Seconds run: 109 (vs limit 100)
# Iterations: 1226
# f(x) calls: 41265 |
Hi @Affie , that's awesome thank you! |
Update: new Parametric RLM performance on Beehive1000 fg = initfg()
fg = generateGraph_Beehive!(1000; dfg=fg, graphinit=false, locality=0.5);
IIF.autoinitParametricManopt!(fg, [ls(fg, r"x"); ls(fg, r"l")])
#Progress: 100%|███████████████████████| Time: 0:00:15
v,r = IIF.solve_RLM(fg; damping_term_min=1e-18, expect_zero_residual=true);
# takes less than 1 second from initialized fg with noise added, but cannot solve with uninitialized.
# converges in 5 iterations For Pose3 graph in the first post, Nr variables: 6, Nr factors: 7 @time v,r = IIF.solve_RLM(fg; is_sparse=true, debug, stopping_criterion, damping_term_min=1e-18, expect_zero_residual=true);
# 5.567798 seconds (6.62 M allocations: 453.195 MiB, 6.78% gc time, 99.79% compilation time)
# 0.009137 seconds (24.01 k allocations: 2.460 MiB)
# 4 iterations |
In my experiments, time spent in RLM (not counting compilation) was dominated by Cholesky decomposition for the linear subproblem. Is this the same thing for you? It's possible in certain situations the linear subproblem could be solved better than the existing implementation, or there could be some other optimizations. |
Currently, a lot of time is spent on the factor cost function itself, there are still a lot of allocations that shouldn't be there. I haven't been able to figure it out yet. I'll post a flame graph. |
Sure, calculating cost function and its Jacobian can be take a lot of time too. |
Yes, it looks like we miss some specialized implementations in Manifolds.jl. I can take a look at it and see what could be improved if you give me a code to reproduce the benchmarks. There are many snippets in this thread and I'm not sure which one to try. |
Also, it's cool that it's already an improvement. I didn't even implement RLM with robotics in mind, and this RLM doesn't use many tricks developed for the Euclidean case 😄 . |
Thanks for offering. The best results currently need branches in RoME and IIF, give me a few days to get them merged and I can get an example together that will work on the master branches. |
I quickly made this, it is the minimum to evaluate a single cost function (factor). using Manifolds
using StaticArrays
using DistributedFactorGraphs
using IncrementalInference
using RoME
function runFactorLots(factor::T, N=100000) where T<:AbstractRelative
cf = CalcFactor(factor, 0, nothing, false, nothing, (), 0, getManifold(f))
M = getManifold(T)
# point from where measurement is taken ∈ M
p = getPointIdentity(M)
# point to where measurement is taken ∈ M
q = convert(typeof(p), exp(M, p, hat(M, p, mean(factor.Z))))
# q = exp(M, p, hat(M, p, mean(factor.Z)))
res = zeros(6)
for _ in 1:N
# measurement vector TpM - hat returns mutable array
m = convert(typeof(p), hat(M, p, rand(factor.Z)))
# m = hat(M, p, rand(factor.Z))
# to run the factor
res = cf(m, p, q)
end
end
## Factors types to test:
# Factor for SE2 to SE2 measurement
f = Pose2Pose2(MvNormal(SA[1.,0,0], SA[0.1,0.1,0.01]))
# Factor for SE3 to SE3 measurement
f = Pose3Pose3(MvNormal(SA[1.,0,0,0,0,0], SA[0.1,0.1,0.1,0.01,0.01,0.01]))
##
@time runFactorLots(f)
@profview runFactorLots(f)
@profview_allocs runFactorLots(f) |
Here are some basic special-cases that make it much faster: function Manifolds.exp(M::SpecialEuclidean, p::ArrayPartition, X::ArrayPartition)
return ArrayPartition(exp(M.manifold.manifolds[1].manifold, p.x[1], X.x[1]), exp(M.manifold.manifolds[2].manifold, p.x[2], X.x[2]))
end
function Manifolds.log(M::SpecialEuclidean, p::ArrayPartition, q::ArrayPartition)
return ArrayPartition(log(M.manifold.manifolds[1].manifold, p.x[1], q.x[1]), log(M.manifold.manifolds[2].manifold, p.x[2], q.x[2]))
end
function Manifolds.vee(M::SpecialEuclidean, p::ArrayPartition, X::ArrayPartition)
return vcat(vee(M.manifold.manifolds[1].manifold, p.x[1], X.x[1]), vee(M.manifold.manifolds[2].manifold, p.x[2], X.x[2]))
end
function Manifolds.compose(M::SpecialEuclidean, p::ArrayPartition, q::ArrayPartition)
return ArrayPartition(p.x[2]*q.x[1] + p.x[1], p.x[2]*q.x[2])
end @kellertuer would you be fine with adding them to Manifolds.jl?
|
Sure we can add that to Manifolds.jl, looks good to me performance wise |
@mateuszbaran, thanks very much it already makes a big difference. Just note: log might be wrong (I think it misses the semidirect part and only does the product manifold) |
You're welcome 🙂 . I've made a small mistake in |
I would also consider the above one still a great retraction for the lie one (first order approximation that is) so your convergence maybe fails due to other parameters then. |
Thanks that was it. It converges now.
Ok, so it's the same confusion I had with the exp. So it's equivalent to log of the product group |
Yes. Our SE(n) terminology isn't entirely consistent with literature on Lie groups in robotics but we have the same basic operations available in some form, and keep adding more. |
It looks like the next bottleneck is in a Power Manifold version I implemented myself. I create a product manifold of NPowerManifolds to use a mixture of different point types (for example TranslationGroup(3) and SE(3)) The reason I didn't use PowerManifold from Manifolds.jl is the power is stored in the type parameter and that caused compiling new functions every time I solved with a different number of points. For example new functions for: julia> typeof(PowerManifold(Sphere(2), 11))
PowerManifold{ℝ, Sphere{2, ℝ}, Tuple{11}, ArrayPowerRepresentation}
julia> typeof(PowerManifold(Sphere(2), 10))
PowerManifold{ℝ, Sphere{2, ℝ}, Tuple{10}, ArrayPowerRepresentation} Is [something similar to] NPowerManifold something that will be useful in Manifolds.jl? Also, how do I get the performance I should, from above it looks like I should avoid the dispatches caused by Here is the code: edit: I might just be missing |
I think by now we have both variants, where the power is in the parameter of the type and where it is in a field :) |
The next breaking version of Manifolds.jl will have types that don't force specialization on size: JuliaManifolds/Manifolds.jl#642 . I just noticed I haven't adapted
Not all occurrences of |
Ah I was already too far ahead and had in mind we already released that, sorry. |
Cool, I'll keep an eye open for it and update our code to use manifolds when it comes out. |
It looks like another dispatch on get_vector (something like this) fixes my issue, I just don't know how to make it generic yet. but thought I'd post it in the meantime. function Manifolds.get_vector(M::SpecialEuclidean{2}, p::ArrayPartition, X, basis::AbstractBasis)
return ArrayPartition(get_vector(M.manifold.manifolds[1].manifold, p.x[1], view(X,1:2), basis), get_vector(M.manifold.manifolds[2].manifold, p.x[2], X[3], basis))
end |
|
I just looked at those zeros, It's really bad as it creates a dense Jacobian in |
Hm, currently |
This issue is currently just a placeholder for performance-related issues around parametric batch solve.
The text was updated successfully, but these errors were encountered: