Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

example of slow model generation #49

Closed
mlubin opened this issue Jan 15, 2015 · 10 comments
Closed

example of slow model generation #49

mlubin opened this issue Jan 15, 2015 · 10 comments

Comments

@mlubin
Copy link
Member

mlubin commented Jan 15, 2015

I was playing around with implementing a basic linear SVM in both Convex.jl and JuMP and noticed a pretty significant penalty for model generation in Convex.jl:

using JuMP
using Distributions
using Gurobi
import Convex # namespace conflicts with JuMP.Variable()

function gen_data(n)
    pos = rand(MvNormal([1.0,2.0],1.0),n)
    neg = rand(MvNormal([-1.0,1.0],1.0),n)
    return pos,neg
end

const N = 2
const C = 10
function svm_jump(pos,neg, solver)

    m = Model(solver=solver)

    @defVar(m, w[1:N])
    @defVar(m, b)
    @defVar(m, ξpos[1:size(pos,2)] >= 0)
    @defVar(m, ξneg[1:size(neg,2)] >= 0)

    @setObjective(m, Min, sum{w[i]^2, i = 1:N} + C*sum(ξpos) + C*sum(ξneg))
    @addConstraint(m, posconstr[j=1:size(pos,2)], dot(w,pos[:,j]) - b >= 1 - ξpos[j])
    @addConstraint(m, negconstr[j=1:size(neg,2)], -1*(dot(w,neg[:,j]) - b) >= 1 - ξneg[j])

    status = solve(m)
    @assert status == :Optimal
    return getValue(w[:]), getValue(b)
end

function svm_convex(pos,neg,solver)
    w = Convex.Variable(N)
    b = Convex.Variable()
    ξpos = Convex.Variable(size(pos,2), Convex.Positive())
    ξneg = Convex.Variable(size(neg,2), Convex.Positive())

    problem = Convex.minimize(Convex.sum_squares(w) + C*sum(ξpos) + C*sum(ξneg))
    for j in 1:size(pos,2)
        push!(problem.constraints, dot(w,pos[:,j]) - b >= 1 - ξpos[j])
        #problem.constraints += dot(w,pos[:,j]) - b >= 1 - ξpos[j]
    end
    for j in 1:size(neg,2)
        push!(problem.constraints, -1*(dot(w,neg[:,j]) - b) >= 1-ξneg[j])
        #problem.constraints += -1*(dot(w,neg[:,j]) - b) >= 1-ξneg[j]
    end
    Convex.solve!(problem, solver)
    return Convex.evaluate(w), Convex.evaluate(b)
end

pos,neg = gen_data(10)
# initial compilation
svm_jump(pos,neg, GurobiSolver(OutputFlag=0))
svm_convex(pos,neg, GurobiSolver(OutputFlag=0))

pos,neg = gen_data(2000)
@time w, b = svm_jump(pos,neg, GurobiSolver())
@show w,b
@time w, b = svm_convex(pos,neg, GurobiSolver())
@show w,b

Both give approximately the same answer. For JuMP on my machine, Gurobi solves in 0.09 seconds and the whole thing is done in 0.10 seconds. For Convex.jl, Gurobi solves in 0.11 seconds but the total time is 6.5 seconds.

I know model generation time hasn't been as much of a priority as with JuMP, and Convex.jl has to do more work in principle since it's performing a lot of automatic transformations. Anyway I wanted to share this as a benchmark that could be used to speed up Convex.jl.

CC @IainNZ @joehuchette

@madeleineudell
Copy link
Contributor

This is an interesting test case. It's a perfect example illustrating
how Convex.jl is optimized for vectorized syntax, while JuMP requires
indexing. In Convex.jl, you can reformulate it as an unconstrained,
vectorized problem, with only w and b as variables, which makes it
much faster (as well as more readable, to my eye):

    function svm_convex(pos,neg,solver)
        w = Convex.Variable(N)
        b = Convex.Variable()
        problem = Convex.minimize(Convex.sum_squares(w) + C*Convex.max(1+b-w'*pos, 0) + C*Convex.max(1-b+w'*neg, 0))
        Convex.solve!(problem, solver)
        return Convex.evaluate(w), Convex.evaluate(b)
    end

Convex.jl parses this in just .3 seconds (with the same .11 second
solve time, although I'm using ECOS instead of Gurobi).

(To make it even more readable, I'd replace Convex.max(1+b-w'*pos, 0) with Convex.hinge_loss(b-w'*pos).)

The reason you'd want to use Convex.jl over JuMP is not for speed, but
so you can write nice things like this. Though of course, if we can
make it faster as well that would be ideal :)

@IainNZ
Copy link
Contributor

IainNZ commented Jan 16, 2015

Hah thats pretty neat

@mlubin
Copy link
Member Author

mlubin commented Jan 16, 2015

I agree that's a nicer formulation. I'm hoping that there's a way to get the best of both worlds where you would choose to use a vectorized formulation because it's clearer and not because it's needed for speed (which is very un-julian). Anyway even without major architectural changes I'd bet that there a few bottlenecks in the scalar case that could be improved without too much effort.

@madeleineudell
Copy link
Contributor

The most terrible thing that we're doing in the scalar case is performing
indexing using sparse matrix multiplication. So if you do

a = Variable(10)
a[2] + a[3] == 1

we construct a sparse matrix which when multiplied by a will select out the
2nd entry; then we do the same for the 3rd entry; then we add these sparse
matrices. You can see why this gets extremely slow. It would be much faster
if we used dense matrices, but the memory gets silly and large. Even worse
are operations like

a = Variable(10,20)
(a')[4,1] == 1

which forms the sparse matrix which when multiplied by vec(a) would produce
vec(a'); then mutliplies that sparse matrix by the sparse matrix that
selects out the [4,1] entry of a matrix of that size. If we were clever we
would never form all the rows of the sparse matrix representing the
transpose operator, since we only end up using one of them.

I'm sure that we could use some insights from JuMP here...

On Fri, Jan 16, 2015 at 1:40 PM, Miles Lubin [email protected]
wrote:

I agree that's a nicer formulation. I'm hoping that there's a way to get
the best of both worlds where you would choose to use a vectorized
formulation because it's clearer and not because it's needed for speed
(which is very un-julian). Anyway even without major architectural changes
I'd bet that there a few bottlenecks in the scalar case that could be
improved without too much effort.


Reply to this email directly or view it on GitHub
#49 (comment).

Madeleine Udell
PhD Candidate in Computational and Mathematical Engineering
Stanford University
www.stanford.edu/~udell

@karanveerm
Copy link
Contributor

Most users won't know the "best way to do things", and indexing along with stacking are probably our slowest operations- so we definitely need to work on it.

@davidlizeng did some clever things to make indexing faster for LLS and I believe he's looking into adding that in Convex.jl. We'll see how much faster that makes this example.

@mlubin
Copy link
Member Author

mlubin commented Jan 17, 2015

Are you using SparseMatrixCSC for everything?
If so, you could try using triplet format which is much more efficient for summing two matrices (all you do is concatenate).
You could also delay generation of the transpose until it's used somewhere.
Just tossing ideas out, not sure if they'll be useful since I don't really know the internal structure of Convex.

@davidlizeng
Copy link
Contributor

Expressions in Convex.jl are maps from variables objects to coefficients, which are SparseMatrixCSC. To index into the expression, we do one of the following

  1. select certain rows of the sparse coefficients using [] to directly select the rows we want
  2. multiply the sparse coefficient by another sparse matrix that has the same effect of selecting the rows we want.

If we are selecting lots of rows all over the sparse matrix, it seems like method 2 is usually better, while selecting few rows, or a chunk of rows in a row usually leads to method 1 being better.

@mlubin @IainNZ Are there other ways that we can use to try to select rows out of SparseMatrixCSC objects that you think might be faster?

@mlubin
Copy link
Member Author

mlubin commented Jan 17, 2015

SparseMatrixCSC is efficient for selecting columns but not rows. If this is the bottleneck operation then I'd recommend using using CSR format (compressed sparse row). Row slices are linear-time lookup. There's an open PR for this (JuliaLang/julia#7029), but if you just want to experiment, CSR is equivalent to CSC with the dimensions flipped.

@odow
Copy link
Member

odow commented Feb 23, 2022

For the original convex post, I now get

julia> using Distributions

julia> using LinearAlgebra

julia> using Gurobi

julia> import Convex # namespace conflicts with JuMP.Variable()

julia> function gen_data(n)
           pos = rand(MvNormal([1.0,2.0],1.0),n)
           neg = rand(MvNormal([-1.0,1.0],1.0),n)
           return pos,neg
       end
gen_data (generic function with 1 method)

julia> const N = 2
2

julia> const C = 10
10

julia> function svm_convex(pos,neg,solver)
           w = Convex.Variable(N)
           b = Convex.Variable()
           ξpos = Convex.Variable(size(pos,2), Convex.Positive())
           ξneg = Convex.Variable(size(neg,2), Convex.Positive())

           problem = Convex.minimize(Convex.sumsquares(w) + C*sum(ξpos) + C*sum(ξneg))
           for j in 1:size(pos,2)
               push!(problem.constraints, dot(w,pos[:,j]) - b >= 1 - ξpos[j])
               #problem.constraints += dot(w,pos[:,j]) - b >= 1 - ξpos[j]
           end
           for j in 1:size(neg,2)
               push!(problem.constraints, -1*(dot(w,neg[:,j]) - b) >= 1-ξneg[j])
               #problem.constraints += -1*(dot(w,neg[:,j]) - b) >= 1-ξneg[j]
           end
           Convex.solve!(problem, solver)
           return Convex.evaluate(w), Convex.evaluate(b)
       end
svm_convex (generic function with 1 method)

julia> pos,neg = gen_data(2000)
([1.9549836788159067 1.7171091461123682  0.0857034301415448 2.217107664293599; 0.9142782414570454 3.391383891929138  2.150346162128055 3.0846956783486137], [-2.3747735768967484 -0.9632429519263247  -1.9632219657972412 -1.1641815583798856; 0.5275107809723425 1.4908871156989485  0.5244168621220339 1.5478250107179292])

julia> @time w, b = svm_convex(pos,neg, Gurobi.Optimizer)
Academic license - for non-commercial use only - expires 2022-04-15
Gurobi Optimizer version 9.1.0 build v9.1.0rc0 (mac64)
Thread count: 4 physical cores, 8 logical processors, using up to 8 threads
Optimize a model with 8008 rows, 4012 columns and 24014 nonzeros
Model fingerprint: 0x7e0bd539
Model has 2 quadratic constraints
Coefficient statistics:
  Matrix range     [4e-05, 1e+01]
  QMatrix range    [1e+00, 1e+00]
  Objective range  [1e+00, 1e+00]
  Bounds range     [0e+00, 0e+00]
  RHS range        [1e+00, 1e+00]
Presolve removed 4002 rows and 1 columns
Presolve time: 0.01s
Presolved: 4006 rows, 4011 columns, 16012 nonzeros
Presolved model has 2 second-order cone constraints
Ordering time: 0.00s

Barrier statistics:
 Dense cols : 3
 Free vars  : 3
 AA' NZ     : 1.201e+04
 Factor NZ  : 1.604e+04 (roughly 3 MBytes of memory)
 Factor Ops : 6.421e+04 (less than 1 second per iteration)
 Threads    : 1

                  Objective                Residual
Iter       Primal          Dual         Primal    Dual     Compl     Time
   0   2.63936156e+05  1.99810454e+04  2.60e+00 2.01e+04  4.58e+01     0s
   1   9.88964802e+04  1.50954405e+04  8.39e-01 1.18e+04  1.92e+01     0s
   2   7.05989167e+04  1.31270445e+04  5.37e-01 8.15e+03  1.32e+01     0s
   3   1.87214697e+04  9.95205410e+03  1.60e-02 1.85e+03  2.36e+00     0s
   4   1.42635612e+04  1.04746351e+04  3.75e-03 6.88e+02  8.44e-01     0s
   5   1.35207428e+04  1.08936853e+04  2.28e-03 4.30e+02  5.35e-01     0s
   6   1.31176886e+04  1.11765792e+04  1.52e-03 3.04e+02  3.79e-01     0s
   7   1.28434221e+04  1.13534443e+04  1.05e-03 2.41e+02  2.89e-01     0s
   8   1.26922566e+04  1.14491159e+04  6.67e-04 2.10e+02  2.42e-01     0s
   9   1.24852995e+04  1.15827954e+04  1.85e-04 1.67e+02  1.78e-01     0s
  10   1.24222926e+04  1.16773917e+04  2.93e-06 1.39e+02  1.48e-01     0s
  11   1.23650014e+04  1.18138275e+04  2.06e-06 1.01e+02  1.07e-01     0s
  12   1.23479331e+04  1.18425480e+04  1.61e-06 9.31e+01  9.81e-02     0s
  13   1.23213208e+04  1.19401568e+04  6.73e-07 6.91e+01  7.31e-02     0s
  14   1.23100141e+04  1.19706912e+04  7.35e-07 6.31e+01  6.55e-02     0s
  15   1.22990185e+04  1.19938315e+04  6.03e-07 5.74e+01  5.89e-02     0s
  16   1.22883280e+04  1.20136702e+04  5.36e-07 5.24e+01  5.31e-02     0s
  17   1.22875077e+04  1.20302735e+04  2.69e-07 4.83e+01  4.96e-02     0s
  18   1.22798852e+04  1.20369287e+04  2.35e-07 4.63e+01  4.69e-02     0s
  19   1.22707587e+04  1.20705517e+04  2.19e-07 3.82e+01  3.84e-02     0s
  20   1.22663177e+04  1.21011822e+04  1.31e-07 3.10e+01  3.15e-02     0s
  21   1.22610991e+04  1.21091698e+04  2.17e-07 2.91e+01  2.92e-02     0s
  22   1.22532029e+04  1.21262368e+04  6.86e-08 2.49e+01  2.45e-02     0s
  23   1.22488727e+04  1.21461177e+04  2.03e-07 2.07e+01  1.99e-02     0s
  24   1.22462311e+04  1.21653762e+04  1.19e-07 1.67e+01  1.58e-02     0s
  25   1.22442693e+04  1.21733717e+04  1.31e-07 1.50e+01  1.40e-02     0s
  26   1.22431147e+04  1.21791383e+04  7.82e-08 1.36e+01  1.26e-02     0s
  27   1.22420458e+04  1.21867057e+04  9.59e-08 1.19e+01  1.09e-02     0s
  28   1.22403342e+04  1.21968617e+04  7.17e-08 9.47e+00  8.63e-03     0s
  29   1.22399947e+04  1.22074753e+04  6.60e-08 6.93e+00  6.40e-03     0s
  30   1.22393923e+04  1.22135922e+04  4.19e-08 5.65e+00  5.12e-03     0s
  31   1.22390998e+04  1.22204954e+04  2.83e-08 4.25e+00  3.75e-03     0s
  32   1.22388921e+04  1.22239764e+04  1.39e-07 3.39e+00  3.00e-03     0s
  33   1.22383519e+04  1.22266122e+04  6.80e-08 2.71e+00  2.37e-03     0s
  34   1.22380912e+04  1.22290447e+04  4.16e-08 2.07e+00  1.82e-03     0s
  35   1.22379909e+04  1.22332853e+04  4.39e-08 1.11e+00  9.59e-04     0s
  36   1.22377215e+04  1.22364936e+04  8.75e-09 2.98e-01  2.53e-04     0s
  37   1.22376906e+04  1.22376527e+04  2.66e-08 8.89e-03  7.71e-06     0s
  38   1.22376903e+04  1.22376637e+04  1.59e-09 6.10e-03  5.36e-06     0s
  39   1.22376896e+04  1.22376663e+04  3.99e-09 5.46e-03  4.73e-06     0s
  40   1.22376892e+04  1.22376881e+04  9.96e-09 4.44e-09  1.44e-07     0s

Barrier solved model in 40 iterations and 0.10 seconds
Optimal objective 1.22376892e+04


User-callback calls 123, time in user-callback 0.00 sec
  1.047763 seconds (3.16 M allocations: 677.923 MiB, 22.29% gc time, 6.96% compilation time)
([1.3915131521712125, 0.7011441389283813], 1.075629207371045)

So better than 6.5 seconds, but still much slower than the solve time of Gurobi.

@odow
Copy link
Member

odow commented Jan 8, 2024

Closing because this has been improved. It could probably still be better, but that's a much larger job.

@odow odow closed this as completed Jan 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

7 participants