Benchmarks with BundleAdjustmentModels.jl #134

amontoison · 2024-06-22T22:17:37Z

@adrhill @gdalle
I tried to run some benchmarks on the NLS problems using BundleAdjustmentModels.jl, but I'm encountering issues with the sparsity detection of any Hessian. It takes an extremely long time. This could be a valuable set of tests for you because the problems are quite large. For the largest problems, my computer also crashes during the sparsity pattern detection of some Jacobian.

Note that I replaced norm(r) with sqrt(dot(r, r)) in the code (see this PR) to use version 5.x of SparseConnectivityTracer.jl. Guillaume, the Jacobian/Hessian of these problems could also be useful for SparseMatrixColorings.jl.

using BundleAdjustmentModels
using ADNLPModels
using NLPModels
using Printf

debug = false
check_jacobian = false
df = problems_df()
# df = df[ ( df.nequ .≤ 600000 ) .& ( df.nvar .≤ 100000 ), :]
nproblems = size(df)[1]

@printf("| %21s | %7s | %7s | %22s | %15s |\n", "problem", "nequ", "nvar", "backend", "coord_residual")
for i = 1:nproblems
  name_pb = df[i, :name]

  # Smaller problem of the collection
  debug && (i == 1) && (name_pb = "problem-49-7776-pre")
  debug && (i ≠ 1) && continue

  ## BundleAdjustementModels
  nls = BundleAdjustmentModel(name_pb)
  meta_nls = nls_meta(nls)
  Fx = similar(nls.meta.x0, meta_nls.nequ)
  rows = Vector{Int}(undef, meta_nls.nnzj)
  cols = Vector{Int}(undef, meta_nls.nnzj)
  vals = similar(nls.meta.x0, meta_nls.nnzj)
  x = rand(nls.meta.nvar)  # nls.meta.x0

  # warm-start
  residual(nls, nls.meta.x0)
  residual!(nls, nls.meta.x0, Fx)
  jac_structure_residual!(nls, rows, cols)
  jac_coord_residual!(nls, x, vals)

  # benchmarks
  start_timer = time()
  jac_coord_residual!(nls, x, vals)
  end_timer = time()
  timer_coord_residual = end_timer - start_timer
  @printf("| %21s | %7s | %7s | %22s | %6.5f seconds |\n", name_pb, nls.nls_meta.nequ, nls.meta.nvar, "BundleAdjustmentModels", timer_coord_residual)

  ## ADNLPModels
  function F!(Fx, x)
    residual!(nls, x, Fx)
  end
  nls2 = ADNLSModel!(F!, nls.meta.x0, meta_nls.nequ, nls.meta.lvar, nls.meta.uvar, jacobian_residual_backend = ADNLPModels.SparseADJacobian,
                                                                                   jacobian_backend = ADNLPModels.EmptyADbackend,
                                                                                   hessian_backend = ADNLPModels.EmptyADbackend,
                                                                                   hessian_residual_backend = ADNLPModels.EmptyADbackend)
  meta_nls2 = nls_meta(nls2)
  rows2 = Vector{Int}(undef, meta_nls2.nnzj)
  cols2 = Vector{Int}(undef, meta_nls2.nnzj)
  vals2 = similar(nls2.meta.x0, meta_nls2.nnzj)

  # Warm-start
  jac_structure_residual!(nls2, rows2, cols2)
  jac_coord_residual!(nls2, x, vals2)

  # benchmarks
  start_timer = time()
  jac_coord_residual!(nls2, x, vals2)
  end_timer = time()
  timer2_coord_residual = end_timer - start_timer
  @printf("| %21s | %7s | %7s | %22s | %6.5f seconds |\n", name_pb, nls2.nls_meta.nequ, nls2.meta.nvar, "ADNLPModels", timer2_coord_residual)

  if check_jacobian
    println(meta_nls.nnzj)
    println(meta_nls2.nnzj)
    J_nls = sparse(rows, cols, vals)
    J_nls2 = sparse(rows2, cols2, vals2)
    norm(J_nls - J_nls2) |> println
    norm(J_nls - J_nls2, Inf) |> println
  end
end

Note that you need hessian_residual_backend = ADNLPModels.SparseADHessian to test the Hessian.

The text was updated successfully, but these errors were encountered:

adrhill · 2024-06-22T22:24:15Z

I'll investigate your benchmarks as soon as possible, but I just want to quickly comment on the following:

For the largest problems, my computer crashes during the sparsity pattern detection of the Jacobian.

We've encountered some compile time problems in SCT v0.5.X which we've since fixed on the main branch (more specifically in #120 (comment) and in #119). We're probably going to tag v0.6.0 as soon as we merge #131.

adrhill · 2024-06-22T22:28:23Z

This could be a valuable set of tests for you because the problems are quite large.

Absolutely! We're actively looking for more large-scale problems to test and benchmark on!

amontoison · 2024-06-22T22:31:55Z

I will open a different discussion / issue but it should be also quite easy to benchmark the OPF problems of rosetta-opf
lanl-ansi/rosetta-opf#65
The problems are quite large too and you can also verify the sparsity pattern with JuMP.

gdalle · 2024-06-22T22:35:09Z

We've encountered some compile time problems in SCT v0.5.X which we've since fixed on the main branch

Note that this was mainly due to the britgas problem formulation being a literal compiler nightmare, with 4 thousand lines of unrolled for loops. I think equally large problems with a more realistic formulation (using actual for loops or vectorized code) would be much less of an issue. That's why I started JuliaSmoothOptimizers/OptimizationProblems.jl#332 but I shudder at the idea of doing this for every single problem.

adrhill · 2024-09-03T15:27:44Z

Does this issue still persist? I tried to run the code above but got a MethodError on line 10:

julia> nproblems = size(df)[1]
ERROR: MethodError: no method matching size(::JLD2.ReconstructedMutable{:DataFrame, (:columns, :colindex), Tuple{Any, DataFrames.Index}})

adrhill · 2024-09-06T13:51:57Z

Bump @amontoison

amontoison · 2024-09-06T14:57:47Z

@adrhill I will have a look asap.

Off-topic: I was talking with Michel Schenen this morning about this package and he has two comments:

Btw, https://adrianhill.de/SparseConnectivityTracer.jl/stable/#Jacobian
For large matrices this "the Jacobian can be
obtained by computing a single forward-pass" might not be true.
Because of the memory requirement you might have to do it in batches.

There's also interesting work by Paul and Aupy: https://link.springer.com/article/10.1007/s101070100281

Paul always wanted to do a Julia implementation of Bayesian probing...

It would be awesome to have.
Forward them that paper.
I'm curious whether they are aware of it.
For really large Js this would be awesome as you get the sparsity in log(n).

adrhill · 2024-09-06T15:25:59Z

Because of the memory requirement you might have to do it in batches.

This is actually a really fun aspect of this package: SCT has very small memory requirements since we only need to keep track of index sets of non-zero values. Our default type is BitSet, which only requires a single bit per index. Our How it works guide will tell you more!

Thanks for the reference by the way, that looks intriguing!
EDIT: did the authors get mixed up? The linked paper is by Griewank and Mitev.

adrhill · 2024-09-06T15:28:33Z

And if you (against all odds) manage to run into a memory limit, we have an undocumented shared memory mode in which all tracers share a single set. ;)

michel2323 · 2024-09-06T15:54:07Z

Thanks for the reference by the way, that looks intriguing! EDIT: did the authors get mixed up? The linked paper is by Griewank and Mitev.

Yes, the authors did get mixed up. I see, you use index sets. That requires allocations (speed), right? It would be nice to have a method for using the underlying AD tools without index sets. For large problems like PDEs, I think index sets are prohibitive. Also, index sets won't work on GPUs. Why would one not just run on the CPU for the sparsity pattern? Well, in some cases, one might use a totally different algorithm on the GPU. The datastructures might be quite different.

Edit: But I see how this is totally fine for JuMP problems.

adrhill · 2024-09-06T16:10:45Z

It would be nice to have a method for using the underlying AD tools without index sets.

This tooling indeed already exists. Our tracer types are just thin wrappers around what we call AbstractPatterns. These patterns can be any representation of a sparse binary vector (or matrix) you want. The default is the aforementioned IndexSetGradientPattern. Do you have a data type in mind you would like to see implemented?

As for GPUs, we have some ideas but decided to get a publication for this package out of the door first (#67). I don't see why GPU support would be impossible if we introduce a suitable statically sized pattern type.

gdalle · 2024-09-06T16:19:09Z

I think this random probing might be a feature of DifferentiationInterface.jl instead. I already have a DenseSparsityDetector, I could add a RandomSparsityDetector too which would be backend-agnostic.

To clarify, SparseConnectivityTracer.jl does not run actual AD, it uses a different brand of abstract program interpretation which only tracks linear and nonlinear dependencies in scalar quantities. It seems extremely stupid and very allocation-intensive, but it's already much faster than what we had before so why not 🤷

michel2323 · 2024-09-06T16:21:00Z

As for GPUs, we have some ideas but decided to get a publication for this package out of the door first (#67). I don't see why GPU support would be impossible if we introduce a suitable statically sized pattern type.

Good idea!

adrhill · 2024-09-06T16:29:03Z

To clarify, SparseConnectivityTracer.jl does not run actual AD

SCT can be thought of as a standalone binary forward-mode AD backend.

it uses a different brand of abstract program interpretation which only tracks linear and nonlinear dependencies in scalar quantities. It seems extremely stupid and very allocation-intensive

SCT is essentially a binary version of ForwardDiff.jl. It is no more “stupid” than ForwardDiff and certainly less allocation-intensive.

gdalle · 2024-09-06T16:30:03Z

It's much more allocation-intensive. Scalars in ForwardDiff are Dual numbers and they are stack-allocated. Ours are sets and they are heap-allocated.

adrhill · 2024-09-06T16:31:23Z

What we could (should?) add to this package is a mechanism for "chunked" index sets, similar to ForwardDiff. This could greatly reduce the memory usage of SCT for large problems.

gdalle · 2024-09-06T16:45:08Z

Essentially we need a set type that is a statically-sized version of BitSet, so that it can be stored in a fixed number of 64-bit integers and stack-allocated.

And then we can either use one of those if it can fit enough inputs, or run the function repeatedly while grouping the inputs in chunks of a multiple of 64. Now I get it

adrhill · 2024-09-06T20:57:46Z

Who would have thought a simple issue bump could turn out to be so inspiring! :)
I think a chunked mode with statically sized patterns might also be the missing puzzle piece for GPU compatibility. I've opened #194 to track our new ideas and continue our discussion.

Let's return to the BundleAdjustmentModels benchmarks here.

amontoison · 2024-09-11T06:25:35Z

Does this issue still persist? I tried to run the code above but got a MethodError on line 10:
julia> nproblems = size(df)[1]
ERROR: MethodError: no method matching size(::JLD2.ReconstructedMutable{:DataFrame, (:columns, :colindex), Tuple{Any, DataFrames.Index}})

@adrhill You need an older version of DataFrames.jl:
https://github.com/JuliaSmoothOptimizers/BundleAdjustmentModels.jl/blob/main/Project.toml#L16

I should upgrade it in BundleAdjustementModels.jl one day...
It should solve your issue 😉

adrhill mentioned this issue Aug 12, 2024

Use DictHessianPattern as default tracer pattern #158

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarks with BundleAdjustmentModels.jl #134

Benchmarks with BundleAdjustmentModels.jl #134

amontoison commented Jun 22, 2024 •

edited

Loading

adrhill commented Jun 22, 2024 •

edited

Loading

adrhill commented Jun 22, 2024

amontoison commented Jun 22, 2024 •

edited

Loading

gdalle commented Jun 22, 2024 •

edited

Loading

adrhill commented Sep 3, 2024

adrhill commented Sep 6, 2024

amontoison commented Sep 6, 2024 •

edited

Loading

adrhill commented Sep 6, 2024 •

edited

Loading

adrhill commented Sep 6, 2024

michel2323 commented Sep 6, 2024 •

edited

Loading

adrhill commented Sep 6, 2024 •

edited

Loading

gdalle commented Sep 6, 2024

michel2323 commented Sep 6, 2024

adrhill commented Sep 6, 2024

gdalle commented Sep 6, 2024

adrhill commented Sep 6, 2024

gdalle commented Sep 6, 2024 •

edited

Loading

adrhill commented Sep 6, 2024

amontoison commented Sep 11, 2024

Benchmarks with BundleAdjustmentModels.jl #134

Benchmarks with BundleAdjustmentModels.jl #134

Comments

amontoison commented Jun 22, 2024 • edited Loading

adrhill commented Jun 22, 2024 • edited Loading

adrhill commented Jun 22, 2024

amontoison commented Jun 22, 2024 • edited Loading

gdalle commented Jun 22, 2024 • edited Loading

adrhill commented Sep 3, 2024

adrhill commented Sep 6, 2024

amontoison commented Sep 6, 2024 • edited Loading

adrhill commented Sep 6, 2024 • edited Loading

adrhill commented Sep 6, 2024

michel2323 commented Sep 6, 2024 • edited Loading

adrhill commented Sep 6, 2024 • edited Loading

gdalle commented Sep 6, 2024

michel2323 commented Sep 6, 2024

adrhill commented Sep 6, 2024

gdalle commented Sep 6, 2024

adrhill commented Sep 6, 2024

gdalle commented Sep 6, 2024 • edited Loading

adrhill commented Sep 6, 2024

amontoison commented Sep 11, 2024

amontoison commented Jun 22, 2024 •

edited

Loading

adrhill commented Jun 22, 2024 •

edited

Loading

amontoison commented Jun 22, 2024 •

edited

Loading

gdalle commented Jun 22, 2024 •

edited

Loading

amontoison commented Sep 6, 2024 •

edited

Loading

adrhill commented Sep 6, 2024 •

edited

Loading

michel2323 commented Sep 6, 2024 •

edited

Loading

adrhill commented Sep 6, 2024 •

edited

Loading

gdalle commented Sep 6, 2024 •

edited

Loading