Improve Parametric Batch Solve Performance #1546

Affie · 2022-06-29T08:48:57Z

This issue is currently just a placeholder for performance-related issues around parametric batch solve.

using IncrementalInference #v0.29
using RoME #v0.19

##

fg = initfg()

addVariable!(fg, :x0, ContinuousScalar)
addVariable!(fg, :x1, ContinuousScalar)
addVariable!(fg, :x2, ContinuousScalar)
addVariable!(fg, :x3, ContinuousScalar)
addVariable!(fg, :x4, ContinuousScalar)
addVariable!(fg, :l1, ContinuousScalar)

addFactor!(fg, [:x0], Prior(Normal(0.0,0.1)))
addFactor!(fg, [:x0,:x1], LinearRelative(Normal(1.0,0.1)))
addFactor!(fg, [:x1,:x2], LinearRelative(Normal(1.0,0.1)))
addFactor!(fg, [:x2,:x3], LinearRelative(Normal(1.0,0.1)))
addFactor!(fg, [:x3,:x4], LinearRelative(Normal(1.0,0.1)))
addFactor!(fg, [:x0,:l1], LinearRelative(Normal(1.0,0.1)))
addFactor!(fg, [:x4,:l1], LinearRelative(Normal(-3.0,0.1)))

initAll!(fg)

@time r = IIF.solveGraphParametric!(fg)
# 0.033984 seconds (151.65 k allocations: 6.195 MiB)
# Seconds run:   0  (vs limit 100)
# Iterations:    6
# f(x) calls:    13
# ∇f(x) calls:   13

##

fg = initfg()

addVariable!(fg, :x0, Point3)
addVariable!(fg, :x1, Point3)
addVariable!(fg, :x2, Point3)
addVariable!(fg, :x3, Point3)
addVariable!(fg, :x4, Point3)
addVariable!(fg, :l1, Point3)

addFactor!(fg, [:x0], PriorPoint3(MvNormal([0.,0,0], [0.1,0.1,0.1])))
addFactor!(fg, [:x0,:x1], Point3Point3(MvNormal([1.,0,0], [0.1,0.1,0.1])))
addFactor!(fg, [:x1,:x2], Point3Point3(MvNormal([1.,0,0], [0.1,0.1,0.1])))
addFactor!(fg, [:x2,:x3], Point3Point3(MvNormal([1.,0,0], [0.1,0.1,0.1])))
addFactor!(fg, [:x3,:x4], Point3Point3(MvNormal([1.,0,0], [0.1,0.1,0.1])))
addFactor!(fg, [:x0,:l1], Point3Point3(MvNormal([1.,0,0], [0.1,0.1,0.1])))
addFactor!(fg, [:x4,:l1], Point3Point3(MvNormal([-3.,0,0], [0.1,0.1,0.1])))

initAll!(fg)

@time r = IIF.solveGraphParametric!(fg)
# 1.313179 seconds (1.71 M allocations: 79.619 MiB, 5.35% gc time, 99.39% compilation time)
# 0.002358 seconds (10.93 k allocations: 1.604 MiB)
# Seconds run:   0  (vs limit 100)
# Iterations:    6
# f(x) calls:    13
# ∇f(x) calls:   13


##

fg = initfg()

addVariable!(fg, :x0, Pose2)
addVariable!(fg, :x1, Pose2)
addVariable!(fg, :x2, Pose2)
addVariable!(fg, :x3, Pose2)
addVariable!(fg, :x4, Pose2)
addVariable!(fg, :l1, Pose2)

addFactor!(fg, [:x0], PriorPose2(MvNormal([0.,0,0], [0.1,0.1,0.01])))
addFactor!(fg, [:x0,:x1], Pose2Pose2(MvNormal([1.,0,0], [0.1,0.1,0.01])))
addFactor!(fg, [:x1,:x2], Pose2Pose2(MvNormal([1.,0,0], [0.1,0.1,0.01])))
addFactor!(fg, [:x2,:x3], Pose2Pose2(MvNormal([1.,0,0], [0.1,0.1,0.01])))
addFactor!(fg, [:x3,:x4], Pose2Pose2(MvNormal([1.,0,0], [0.1,0.1,0.01])))
addFactor!(fg, [:x0,:l1], Pose2Pose2(MvNormal([1.,0,0], [0.1,0.1,0.01])))
addFactor!(fg, [:x4,:l1], Pose2Pose2(MvNormal([-3.,0,0], [0.1,0.1,0.01])))

initAll!(fg)

@time r = IIF.solveGraphParametric!(fg)
# 0.024620 seconds (277.42 k allocations: 26.630 MiB)
# Seconds run:   0  (vs limit 100)
# Iterations:    21
# f(x) calls:    52
# ∇f(x) calls:   52

##

fg = initfg()

addVariable!(fg, :x0, Pose3)
addVariable!(fg, :x1, Pose3)
addVariable!(fg, :x2, Pose3)
addVariable!(fg, :x3, Pose3)
addVariable!(fg, :x4, Pose3)
addVariable!(fg, :l1, Pose3)

addFactor!(fg, [:x0], PriorPose3(MvNormal([0.,0,0,0,0,0], [0.1,0.1,0.1,0.01,0.01,0.01])))
addFactor!(fg, [:x0,:x1], Pose3Pose3(MvNormal([1.,0,0,0,0,0], [0.1,0.1,0.1,0.01,0.01,0.01])))
addFactor!(fg, [:x1,:x2], Pose3Pose3(MvNormal([1.,0,0,0,0,0], [0.1,0.1,0.1,0.01,0.01,0.01])))
addFactor!(fg, [:x2,:x3], Pose3Pose3(MvNormal([1.,0,0,0,0,0], [0.1,0.1,0.1,0.01,0.01,0.01])))
addFactor!(fg, [:x3,:x4], Pose3Pose3(MvNormal([1.,0,0,0,0,0], [0.1,0.1,0.1,0.01,0.01,0.01])))
addFactor!(fg, [:x0,:l1], Pose3Pose3(MvNormal([1.,0,0,0,0,0], [0.1,0.1,0.1,0.01,0.01,0.01])))
addFactor!(fg, [:x4,:l1], Pose3Pose3(MvNormal([-3.,0,0,0,0,0], [0.1,0.1,0.1,0.01,0.01,0.01])))

initAll!(fg)

@time r = IIF.solveGraphParametric!(fg)
# 86.839226 seconds (54.76 M allocations: 2.379 GiB, 0.78% gc time, 99.75% compilation time)
# 0.094435 seconds (531.50 k allocations: 101.043 MiB)
# Seconds run:   0  (vs limit 100)
# Iterations:    39
# f(x) calls:    92
# ∇f(x) calls:   92

Affie · 2022-06-29T16:46:43Z

For future reference, @profview got a nice upgrade
Here is the flame for parametric, memory reuse or stack allocation looks like it will have the biggest impact:

dehann · 2022-06-30T00:37:00Z

Yeah, i suspect caching in the residual and getSample functions will help with the allocations problem. Next step in that story is the preambleCache for CalcFactor.cache. Link to docs:

https://juliarobotics.org/Caesar.jl/latest/concepts/stash_and_cache/

dehann · 2022-06-30T00:43:11Z

Jip, flame graph is cool. The way i read it, is that the hot-loop calling of Manifolds.exp is allocating Array many times. So my guess would be to allocate once in cf.cache._tempmem (whatever name works best) and then use Manifolds.exp! for in-place operations.

Not sure how that is going to influence the AD story which you ran into last time.

Affie · 2022-06-30T06:28:09Z

Not sure how that is going to influence the AD story which you ran into last time.

jip, that is the biggest problem. The type cannot be hardcoded and is actually not consistent in one solve. I can try a sort of operational memory “hot-swop” cache that won’t be type stable, but we can annotate the type.
or I’d also like to try using bits types such as static arrays.

It’s the garbage collector that runs and slows it down.

Affie · 2022-06-30T20:01:27Z

Testing different options that work with AD in Pose2Pose2 from:

https://github.com/JuliaRobotics/RoME.jl/blob/d1b325df3609f5810c757db38afab24713037446/src/factors/Pose2D.jl#L44-L52

    #Full cache/lazy version (11 allocations)
    M = cf.cache.manifold
    ϵ0 = cf.cache.ϵ0
    q̂ = cf.cache.lazy[q, :q_hat]
    exp!(M, q̂, ϵ0, X)
    Manifolds.compose!(M, q̂, p, q̂)   
    X = cf.cache.lazy[q, :X]
    log!(M, X, q, q̂)
    Xc = IIF.getCoordCache!(cf.cache.lazy, M, number_eltype(q), :Xc)
    vee!(M, Xc, q, X)
   
    # no cache version (28 allocations)
    M = getManifold(Pose2)
    q̂ = Manifolds.compose(M, p, exp(M, identity_element(M, p), X)) 
    Xc = vee(M, q, log(M, q, q̂))
   
   
    # no cache version shared  (19 allocations)
    M = getManifold(Pose2)
    q̂ = allocate(q)
    exp!(M, q̂, identity_element(M, p), X)
    Manifolds.compose!(M, q̂, p, q̂) 
    Xc = vee(M, q, log(M, q, q̂))
   
    # as much as possible cache version (17 allocations) 
    q̂ = allocate(q)  
    M = cf.cache.manifold
    ϵ0 = cf.cache.ϵ0
    exp!(M, q̂, ϵ0, X)
    Manifolds.compose!(M, q̂, p, q̂)   
    Xc = vee(M, q, log(M, q, q̂))
   
    # as much as possible and weird reuse cache version (10 allocations) 
    q̂ = allocate(q)  
    M = cf.cache.manifold
    ϵ0 = cf.cache.ϵ0
    exp!(M, q̂, ϵ0, X)
    Manifolds.compose!(M, q̂, p, q̂)   
    Xc = vee(M, q, log!(M, q̂, q, q̂))

Tested with with @time macro:

N=10
fg = initfg(GraphsDFG;solverParams=SolverParams(algorithms=[:default, :parametric]))
fg.solverParams.graphinit = false

addVariable!(fg, :x0, Pose2)
addFactor!(fg, [:x0], PriorPose2(MvNormal([0.,0,0], [0.1,0.1,0.01])))

dist = MvNormal([1.,0,0], [0.1,0.1,0.01])
for i=1:N
    addVariable!(fg, Symbol("x$i"), Pose2)
    addFactor!(fg, [Symbol("x$(i-1)"),Symbol("x$i")], Pose2Pose2(dist))
end

IIF.solveGraphParametric(fg)

# solveGraph!(fg)

dehann · 2022-06-30T23:03:35Z

I still want to figure out on this if we cannot just move the preambleCache step in the solve code sooner so that parametric solve CAN have the type information. Might be as simple as that?

Affie · 2022-07-01T06:30:50Z

I still want to figure out on this if we cannot just move the preambleCache step in the solve code sooner so that parametric solve CAN have the type information. Might be as simple as that?

Sorry, I don’t understand?

I found this package specifically for the issue: https://github.com/SciML/PreallocationTools.jl
If you look at the readme it describes the problem.

dehann · 2022-07-02T22:47:02Z

Ah, perhaps i can ask this way round: Where is the first line of code in a parametric solve where AD is called?

I'll go look from there in IIF why the CalcFactor type is not stable or assigned yet.

The question i'm trying to figure out is whether the current intended type assignment to the cache object "happens after" or "happens before" the first AD request in parametric solve. And maybe the answer here is just, let's move preambleCache call "earlier" somehow, so that by the time AD happens all the types are set. Or i'm misunderstanding the issue.

Affie · 2022-07-03T11:58:07Z

Ah, perhaps i can ask this way round: Where is the first line of code in a parametric solve where AD is called?

Currently from Optim.jl depending on the value passed for autodiff

From link above, this is what is needed:

The dualcache function builds a DualCache object that stores both a version of the cache for u and for the Dual version of u, allowing use of pre-cached vectors with forward-mode automatic differentiation. Note that dualcache, due to its design, is only compatible with arrays that contain concretely typed elements.

dehann · 2022-07-03T17:06:23Z

Currently from Optim.jl depending on the value passed for autodiff

Right ... so i'd expect cf.cache to be all concrete types by then. Sounds like the issue is bit trickier.

[need] DualCache object

oops, i'm not much help here yet. I need to learn on this. I'd suggest you keep going on this track and I'll work on other data upload work. I'll come back here after AMP41 and IIF1010, and since 1010 is also about AD it would be a good time for me to loop back.

Affie · 2022-07-03T17:35:45Z

I'm completely refactoring to use a product manifold of power manifolds. I think that way the same data structure can support both Manopt.jl and Optim.jl.

mateuszbaran · 2022-07-04T09:50:58Z

Not sure how that is going to influence the AD story which you ran into last time.

jip, that is the biggest problem. The type cannot be hardcoded and is actually not consistent in one solve. I can try a sort of operational memory “hot-swop” cache that won’t be type stable, but we can annotate the type. or I’d also like to try using bits types such as static arrays.

Yes, SArray is both AD-friendly and fast -- I'd suggest going that way instead of caches. One note about AD, for product manifolds Manifolds.jl supports both ProductRepr and ArrayPartition, and the latter is more reverse-AD friendly. There are almost no drawbacks to ArrayPartition vs ProductRepr.

Also note, for power manifolds, that Manifolds.jl supports https://github.com/JuliaArrays/HybridArrays.jl .

BTW, there are some issues with AD through exp and log on (among other) SO(n), I have it largely solved for ForwardDiff.jl and ChainRules.jl but I think this is something worth discussing.

Affie · 2022-07-14T13:55:48Z

xref #1372

Affie · 2022-08-11T20:22:57Z

A rough benchmark of where we are.

fg = initfg()
fg = generateGraph_Beehive!(1000; dfg=fg, graphinit=false, locality=0.5);

r=IIF.solveGraphParametric(fg; autodiff=:finite, computeCovariance=false);
# Seconds run:   164  (vs limit 100)
# Iterations:    1
# f(x) calls:    8
# ∇f(x) calls:   8

r=IIF.solveGraphParametric(fg; ...  Nelder-Mead
# Seconds run:   109  (vs limit 100)
# Iterations:    1226
# f(x) calls:    41265

dehann · 2022-08-11T23:41:05Z

Hi @Affie , that's awesome thank you!

Affie · 2023-10-04T12:09:46Z

Update: new Parametric RLM performance on Beehive1000

fg = initfg()
fg = generateGraph_Beehive!(1000; dfg=fg, graphinit=false, locality=0.5);
IIF.autoinitParametricManopt!(fg, [ls(fg, r"x"); ls(fg, r"l")])
#Progress: 100%|███████████████████████| Time: 0:00:15

v,r = IIF.solve_RLM(fg; damping_term_min=1e-18, expect_zero_residual=true);
# takes less than 1 second from initialized fg with noise added, but cannot solve with uninitialized.
# converges in 5 iterations

For Pose3 graph in the first post, Nr variables: 6, Nr factors: 7

@time v,r = IIF.solve_RLM(fg; is_sparse=true, debug, stopping_criterion, damping_term_min=1e-18, expect_zero_residual=true);
# 5.567798 seconds (6.62 M allocations: 453.195 MiB, 6.78% gc time, 99.79% compilation time)
# 0.009137 seconds (24.01 k allocations: 2.460 MiB)
# 4 iterations

mateuszbaran · 2023-10-05T07:45:25Z

In my experiments, time spent in RLM (not counting compilation) was dominated by Cholesky decomposition for the linear subproblem. Is this the same thing for you? It's possible in certain situations the linear subproblem could be solved better than the existing implementation, or there could be some other optimizations.

Affie · 2023-10-05T08:59:04Z

In my experiments, time spent in RLM (not counting compilation) was dominated by Cholesky decomposition for the linear subproblem. Is this the same thing for you? It's possible in certain situations the linear subproblem could be solved better than the existing implementation, or there could be some other optimizations.

Currently, a lot of time is spent on the factor cost function itself, there are still a lot of allocations that shouldn't be there. I haven't been able to figure it out yet. I'll post a flame graph.
I did notice that for bigger problems the time spent shifted more to Cholesky decomposition.

mateuszbaran · 2023-10-05T09:04:31Z

Sure, calculating cost function and its Jacobian can be take a lot of time too.

Affie · 2023-10-05T09:23:35Z

Here is the @profview flame-graph of a larger graph, as you can see most of the time is spent on the cost to calculate the Jacobian. Jacobian has size=(59148, 58182). We were unable to solve problems of this size previously, so it is already a big improvement.

Affie · 2023-10-05T09:27:40Z

Here is the allocations flame-graph for a smaller problem @profileview_allocs

We upgraded to Static Arrays and ArrayPartition, but I think we are still hitting generic implementations in Manifolds.jl I did try and implement specialized dispatches for static arrays but haven't had success yet.

mateuszbaran · 2023-10-05T09:55:03Z

Yes, it looks like we miss some specialized implementations in Manifolds.jl. I can take a look at it and see what could be improved if you give me a code to reproduce the benchmarks. There are many snippets in this thread and I'm not sure which one to try.

mateuszbaran · 2023-10-05T09:57:24Z

Also, it's cool that it's already an improvement. I didn't even implement RLM with robotics in mind, and this RLM doesn't use many tricks developed for the Euclidean case 😄 .

Affie · 2023-10-05T10:18:33Z

Yes, it looks like we miss some specialized implementations in Manifolds.jl. I can take a look at it and see what could be improved if you give me a code to reproduce the benchmarks. There are many snippets in this thread and I'm not sure which one to try.

Thanks for offering. The best results currently need branches in RoME and IIF, give me a few days to get them merged and I can get an example together that will work on the master branches.

Affie · 2023-10-05T12:49:00Z

I quickly made this, it is the minimum to evaluate a single cost function (factor).

using Manifolds
using StaticArrays
using DistributedFactorGraphs
using IncrementalInference
using RoME

function runFactorLots(factor::T, N=100000) where T<:AbstractRelative
    cf = CalcFactor(factor, 0, nothing, false, nothing, (), 0, getManifold(f))
    M = getManifold(T)
    # point from where measurement is taken ∈ M 
    p = getPointIdentity(M) 
    # point to where measurement is taken ∈ M 
    q = convert(typeof(p), exp(M, p, hat(M, p, mean(factor.Z))))
    # q = exp(M, p, hat(M, p, mean(factor.Z)))
    res = zeros(6)
    for _ in 1:N
        # measurement vector TpM - hat returns mutable array
        m = convert(typeof(p), hat(M, p, rand(factor.Z)))
        # m = hat(M, p, rand(factor.Z))
        # to run the factor 
        res = cf(m, p, q)
    end
end

## Factors types to test: 
# Factor for SE2 to SE2 measurement
f = Pose2Pose2(MvNormal(SA[1.,0,0], SA[0.1,0.1,0.01]))

# Factor for SE3 to SE3 measurement
f = Pose3Pose3(MvNormal(SA[1.,0,0,0,0,0], SA[0.1,0.1,0.1,0.01,0.01,0.01]))

##
@time runFactorLots(f)
@profview runFactorLots(f)
@profview_allocs runFactorLots(f)

mateuszbaran · 2023-10-05T20:20:48Z

Here are some basic special-cases that make it much faster:

function Manifolds.exp(M::SpecialEuclidean, p::ArrayPartition, X::ArrayPartition)
    return ArrayPartition(exp(M.manifold.manifolds[1].manifold, p.x[1], X.x[1]), exp(M.manifold.manifolds[2].manifold, p.x[2], X.x[2]))
end
function Manifolds.log(M::SpecialEuclidean, p::ArrayPartition, q::ArrayPartition)
    return ArrayPartition(log(M.manifold.manifolds[1].manifold, p.x[1], q.x[1]), log(M.manifold.manifolds[2].manifold, p.x[2], q.x[2]))
end
function Manifolds.vee(M::SpecialEuclidean, p::ArrayPartition, X::ArrayPartition)
    return vcat(vee(M.manifold.manifolds[1].manifold, p.x[1], X.x[1]), vee(M.manifold.manifolds[2].manifold, p.x[2], X.x[2]))
end
function Manifolds.compose(M::SpecialEuclidean, p::ArrayPartition, q::ArrayPartition)
    return ArrayPartition(p.x[2]*q.x[1] + p.x[1], p.x[2]*q.x[2])
end

@kellertuer would you be fine with adding them to Manifolds.jl?

vee can be even faster if special-cased for SE(2) and SE(3).

kellertuer · 2023-10-05T21:12:51Z

Sure we can add that to Manifolds.jl, looks good to me performance wise

Affie · 2023-10-06T09:32:06Z

@mateuszbaran, thanks very much it already makes a big difference. Just note: log might be wrong (I think it misses the semidirect part and only does the product manifold)
If i run the optimization with the log dispatch above included I don't get convergence.

mateuszbaran · 2023-10-06T09:36:51Z

You're welcome 🙂 . I've made a small mistake in log written above, I've already fixed it in my PR and in edit to my comment. This doesn't have any semidirect part because it is a log corresponding to the product metric, not the Lie group logarithm (see log_lie).

kellertuer · 2023-10-06T09:39:12Z

I would also consider the above one still a great retraction for the lie one (first order approximation that is) so your convergence maybe fails due to other parameters then.

Affie · 2023-10-06T09:56:58Z

You're welcome 🙂 . I've made a small mistake in log written above, I've already fixed it in my PR and in edit to my comment.

Thanks that was it. It converges now.

This doesn't have any semidirect part because it is a log corresponding to the product metric, not the Lie group logarithm (see log_lie).

Ok, so it's the same confusion I had with the exp. So it's equivalent to log of the product group TranslationGroup(N) X SpecialOrthogonal(N)

mateuszbaran · 2023-10-06T10:00:07Z

Ok, so it's the same confusion I had with the exp. So it's equivalent to log of the product group TranslationGroup(N) X SpecialOrthogonal(N)

Yes. Our SE(n) terminology isn't entirely consistent with literature on Lie groups in robotics but we have the same basic operations available in some form, and keep adding more.

Affie · 2023-10-06T10:45:58Z

It looks like the next bottleneck is in a Power Manifold version I implemented myself. I create a product manifold of NPowerManifolds to use a mixture of different point types (for example TranslationGroup(3) and SE(3))

The reason I didn't use PowerManifold from Manifolds.jl is the power is stored in the type parameter and that caused compiling new functions every time I solved with a different number of points. For example new functions for:

julia> typeof(PowerManifold(Sphere(2), 11))
PowerManifold{ℝ, Sphere{2, ℝ}, Tuple{11}, ArrayPowerRepresentation}

julia> typeof(PowerManifold(Sphere(2), 10))
PowerManifold{ℝ, Sphere{2, ℝ}, Tuple{10}, ArrayPowerRepresentation}

Is [something similar to] NPowerManifold something that will be useful in Manifolds.jl? Also, how do I get the performance I should, from above it looks like I should avoid the dispatches caused by ManifoldsBase/src/nested_trait.jl

Here is the code:

https://github.com/JuliaRobotics/IncrementalInference.jl/blob/a33dd76b3a66762a01111a3b04ba043ecfdf0afb/src/manifolds/services/ManifoldsExtentions.jl#L27C1-L93

edit: I might just be missing get_vector of SE(N)

kellertuer · 2023-10-06T11:06:00Z

I think by now we have both variants, where the power is in the parameter of the type and where it is in a field :)

mateuszbaran · 2023-10-06T11:06:03Z

The reason I didn't use PowerManifold from Manifolds.jl is the power is stored in the type parameter and that caused compiling new functions every time I solved with a different number of points. For example new functions for:

The next breaking version of Manifolds.jl will have types that don't force specialization on size: JuliaManifolds/Manifolds.jl#642 . I just noticed I haven't adapted PowerManifold there yet but it will be done.

Is [something similar to] NPowerManifold something that will be useful in Manifolds.jl? Also, how do I get the performance I should, from above it looks like I should avoid the dispatches caused by ManifoldsBase/src/nested_trait.jl

Not all occurrences of nested_trait.jl dispatch are slow but if they are, it usually means we need a specialized dispatch. Performance of power manifolds is a fairly deep topic so I'd have to dig a bit deeper into an example that needs to be sped up.

kellertuer · 2023-10-06T11:06:58Z

Ah I was already too far ahead and had in mind we already released that, sorry.

Affie · 2023-10-06T11:10:04Z

Cool, I'll keep an eye open for it and update our code to use manifolds when it comes out.

Affie · 2023-10-06T11:15:40Z

Not all occurrences of nested_trait.jl dispatch are slow but if they are, it usually means we need a specialized dispatch. Performance of power manifolds is a fairly deep topic so I'd have to dig a bit deeper into an example that needs to be sped up.

It looks like another dispatch on get_vector (something like this) fixes my issue, I just don't know how to make it generic yet. but thought I'd post it in the meantime.

function Manifolds.get_vector(M::SpecialEuclidean{2}, p::ArrayPartition, X, basis::AbstractBasis)
  return ArrayPartition(get_vector(M.manifold.manifolds[1].manifold, p.x[1], view(X,1:2), basis), get_vector(M.manifold.manifolds[2].manifold, p.x[2], X[3], basis))
end

Affie · 2023-10-06T11:20:03Z

Thanks again, this made a major difference in performance already, the flame graph of the same solve as above (includes get_vector(::SpecialEuclidean):

mateuszbaran · 2023-10-06T11:48:54Z

get_vector / hat is a bit difficult to do both generically and fast so we might just settle on making it fast on SE(2) and SE(3). The flame graph looks quite reasonable now 🙂 . Maybe except those zeros that take a lot of time, that should also be investigated. Anyway, Cholesky decomposition is close to being the bottleneck but that's something I don't know too much about.

Affie · 2023-10-06T12:12:28Z

Maybe except those zeros that take a lot of time, that should also be investigated

I just looked at those zeros, It's really bad as it creates a dense Jacobian in InplaceEvaluation where it should just use the sparse Jacobian I pass in. I'll try and fix it and PR Manopt.

mateuszbaran · 2023-10-06T12:23:16Z

Hm, currently InplaceEvaluation doesn't handle sparse Jacobians well. NonlinearLeastSquaresObjective might need an additional field or type parameter for sparsity handling.

Affie added performance parametric solve labels Jun 29, 2022

Affie added this to the v0.0.x milestone Jun 29, 2022

dehann added the factors label Jun 30, 2022

Affie mentioned this issue Jul 2, 2022

getindex not implemented on ProductGroup and is there a way to view components? JuliaManifolds/Manifolds.jl#495

Closed

This was referenced Jul 4, 2022

Power Manifolds usage and bugs JuliaManifolds/Manifolds.jl#497

Closed

Providing gradients and automatic differentiation #1182

Open

Standardize ArrayPartition, Manifolds.ProductRepr, and Tuple[...] #1537

Closed

Affie added this to Parametric Solver Aug 1, 2022

Affie mentioned this issue Oct 5, 2023

Refactor Parametric solve for better performance and better use of Manopt.jl #1778

Merged

mateuszbaran mentioned this issue Oct 6, 2023

Improve performance of selected operations on SE(n) JuliaManifolds/Manifolds.jl#655

Merged

Affie mentioned this issue Oct 6, 2023

Fix extra allocations in get_gradient!'s jacobian JuliaManifolds/Manopt.jl#300

Closed

Improve Parametric Batch Solve Performance #1546

Improve Parametric Batch Solve Performance #1546

Comments

Affie commented Jun 29, 2022

Affie commented Jun 29, 2022

dehann commented Jun 30, 2022

dehann commented Jun 30, 2022 • edited Loading

Affie commented Jun 30, 2022

Affie commented Jun 30, 2022 • edited Loading

dehann commented Jun 30, 2022

Affie commented Jul 1, 2022

dehann commented Jul 2, 2022 • edited Loading

Affie commented Jul 3, 2022

dehann commented Jul 3, 2022

Affie commented Jul 3, 2022

mateuszbaran commented Jul 4, 2022

Affie commented Jul 14, 2022

Affie commented Aug 11, 2022

dehann commented Aug 11, 2022

Affie commented Oct 4, 2023 • edited Loading

mateuszbaran commented Oct 5, 2023

Affie commented Oct 5, 2023 • edited Loading

mateuszbaran commented Oct 5, 2023

Affie commented Oct 5, 2023

Affie commented Oct 5, 2023

mateuszbaran commented Oct 5, 2023

mateuszbaran commented Oct 5, 2023

Affie commented Oct 5, 2023

Affie commented Oct 5, 2023

mateuszbaran commented Oct 5, 2023 • edited Loading

kellertuer commented Oct 5, 2023

Affie commented Oct 6, 2023

mateuszbaran commented Oct 6, 2023

kellertuer commented Oct 6, 2023

Affie commented Oct 6, 2023

mateuszbaran commented Oct 6, 2023

Affie commented Oct 6, 2023 • edited Loading

kellertuer commented Oct 6, 2023

mateuszbaran commented Oct 6, 2023

kellertuer commented Oct 6, 2023

Affie commented Oct 6, 2023

Affie commented Oct 6, 2023

Affie commented Oct 6, 2023 • edited Loading

mateuszbaran commented Oct 6, 2023

Affie commented Oct 6, 2023 • edited Loading

mateuszbaran commented Oct 6, 2023

dehann commented Jun 30, 2022 •

edited

Loading

Affie commented Jun 30, 2022 •

edited

Loading

dehann commented Jul 2, 2022 •

edited

Loading

Affie commented Oct 4, 2023 •

edited

Loading

Affie commented Oct 5, 2023 •

edited

Loading

mateuszbaran commented Oct 5, 2023 •

edited

Loading

Affie commented Oct 6, 2023 •

edited

Loading

Affie commented Oct 6, 2023 •

edited

Loading

Affie commented Oct 6, 2023 •

edited

Loading