Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault when solving large problem with barrier #81

Closed
mtanneau opened this issue Apr 25, 2020 · 10 comments
Closed

Segfault when solving large problem with barrier #81

mtanneau opened this issue Apr 25, 2020 · 10 comments
Labels

Comments

@mtanneau
Copy link
Contributor

Don't know whether this happens because of Julia or whether it's a Clp internal bug.
I am getting segmentation faults when trying to solve large-ish LPs with Clp's barrier algorithm.

The instance in the example below is neos3 from Hans Mittelmann's benchmark and can be download here.

MWE:

import Clp

clp = Clp.ClpCInterface.ClpModel()
options = Clp.ClpCInterface.ClpSolve()
Clp.ClpCInterface.set_solve_type(options, 4)  # barrier, no crossover

Clp.ClpCInterface.read_mps(clp, "neos3.mps")

Clp.ClpCInterface.initial_solve_with_options(clp, options)

and the output:

Coin0001I At line 1 NAME          neos3
Coin0001I At line 2 ROWS
Coin0001I At line 512213 COLUMNS
Coin0001I At line 1283910 RHS
Coin0001I At line 1283912 ENDATA
Coin0002I Problem neos3 has 512209 rows, 6624 columns and 1542816 elements
Clp0027I Model was imported from dat/plato/neos3.mps in 0.515625 seconds
Coin0506I Presolve 512209 (0) rows, 6624 (0) columns and 1542816 (0) elements
2.0729e+08 elements in sparse Cholesky, flop count 3.4653e+16

signal (11): Segmentation fault
in expression starting at REPL[6]:1
_ZN15ClpCholeskyBase9factorizeEPKdPi at /home/mtanneau/.julia/artifacts/6698bf93c2ab2c997ca5a4d58f84329c113b2990/lib/libClp.so.1 (unknown line)
_ZN21ClpPredictorCorrector14createSolutionEv at /home/mtanneau/.julia/artifacts/6698bf93c2ab2c997ca5a4d58f84329c113b2990/lib/libClp.so.1 (unknown line)
_ZN21ClpPredictorCorrector5solveEv at /home/mtanneau/.julia/artifacts/6698bf93c2ab2c997ca5a4d58f84329c113b2990/lib/libClp.so.1 (unknown line)
_ZN10ClpSimplex12initialSolveER8ClpSolve at /home/mtanneau/.julia/artifacts/6698bf93c2ab2c997ca5a4d58f84329c113b2990/lib/libClp.so.1 (unknown line)
initial_solve_with_options at /home/mtanneau/.julia/packages/Clp/ULSlO/src/ClpCInterface.jl:200
unknown function (ip: 0x7f8adc1068a3)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2158 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2322
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1692 [inlined]
do_call at /buildworker/worker/package_linux64/build/src/interpreter.c:369
eval_value at /buildworker/worker/package_linux64/build/src/interpreter.c:458
eval_stmt_value at /buildworker/worker/package_linux64/build/src/interpreter.c:409 [inlined]
eval_body at /buildworker/worker/package_linux64/build/src/interpreter.c:817
jl_interpret_toplevel_thunk at /buildworker/worker/package_linux64/build/src/interpreter.c:911
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:814
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:764
jl_toplevel_eval_in at /buildworker/worker/package_linux64/build/src/toplevel.c:843
eval at ./boot.jl:331
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2144 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2322
eval_user_input at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/REPL/src/REPL.jl:86
macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/REPL/src/REPL.jl:118 [inlined]
#26 at ./task.jl:358
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2144 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2322
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1692 [inlined]
start_task at /buildworker/worker/package_linux64/build/src/task.c:687
unknown function (ip: (nil))
Allocations: 16523660 (Pool: 16518949; Big: 4711); GC: 12
Segmentation fault (core dumped)

Finally, system info:

Julia Version 1.4.0
Commit b8e9a9ecc6 (2020-03-21 16:36 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, skylake)

and Clp versions (from Manifest):

[[Clp]]
deps = ["BinaryProvider", "Clp_jll", "Libdl", "LinearAlgebra", "MathOptInterface", "MathProgBase", "SparseArrays"]
git-tree-sha1 = "dfaabbde22abbdf30a8d35f1ff49db174faa4901"
repo-rev = "master"
repo-url = "https://github.com/JuliaOpt/Clp.jl.git"
uuid = "e2554f3b-3117-50c0-817c-e040a3ddf72d"
version = "0.7.1"

[[Clp_jll]]
deps = ["CoinUtils_jll", "CompilerSupportLibraries_jll", "Libdl", "OpenBLAS32_jll", "Osi_jll", "Pkg"]
git-tree-sha1 = "7fec44e2cf907d339d2bcc2f1dffe611401e5560"
uuid = "06985876-5285-5a41-9fcb-8948a742cc53"
version = "1.17.6+4"
@odow
Copy link
Member

odow commented Apr 26, 2020

I'm guessing this is an issue with MUMPS :(.

The good news is, it's only a 500 line function to debug: https://github.com/coin-or/Clp/blob/releases/1.17.6/Clp/src/ClpCholeskyBase.cpp#L2720-L3299

@DLC49
Copy link

DLC49 commented May 16, 2020

I have experienced segmentation faults as the instance runs out of RAM due to leakage. Apparently this is frequent in cases of c wrappers and not pure Julia modules.

@odow
Copy link
Member

odow commented May 17, 2020

@mtanneau can you reproduce this on the latest master?

How does one even uncompress that neos3.gz file? When I try gzip -d neos3.gz, it gets corrupted with weird lines. Digging through the READMEs on Hans' ftp suggests I should use http://www.netlib.org/lp/data/emps.c, but when I use this, I get some weird checksum error.

./emps: Check sum error: expected
 D%f-W`HSSS6m@Q#~[oHweMHm6MC%Htd9Nr*WG7^;1CHi%giIvQ0x161M'}1{Q-R[9;N^I]8

but got
?z?ꉪ'???z?ꉪ'???z?ꉪ'???z?ꉪ'???z?ꉪ'???z~???7??????ht?toy??q??Ygy?
./emps: Check sum line =: line 73 (char 5138) of neos3.mps

@mtanneau
Copy link
Contributor Author

This is how I extracted the instance (on a Linux machine):

wget http://plato.asu.edu/ftp/lptestset/misc/neos3.gz
gunzip neos3.gz
wget http://www.netlib.org/lp/data/emps.c
gcc emps.c
./a.out -s neos3

I'll try this on master.

@odow
Copy link
Member

odow commented May 17, 2020

Ah, I missed the need to decompress twice.

Here's the syntax for the latest release:

using Clp
clp = Clp.Clp_newModel()
opt = Clp.ClpSolve_new()
Clp.ClpSolve_setSolveType(opt, 4, false)
Clp.Clp_readMps(clp, "neos3.mps", false, false)
Clp.Clp_initialSolveWithOptions(clp, opt)

@odow
Copy link
Member

odow commented May 17, 2020

Can reproduce on latest master with identical stacktrace on both linux and Mac.

Since these are now just direct calls to the C library, this is either a bug in how we compile the Clp_jll, or Clp itself.

@odow odow added the bug label May 17, 2020
@mtanneau
Copy link
Contributor Author

mtanneau commented May 17, 2020

Still happens for me too.

My hypothesis is that Clp just dies out because it is trying to factorize a way too large matrix (see the flop count at the second line).

What puzzles me is that, when running Clp from the command line (see e.g. Clp's log from H. Mittelmann's website here), it does fine.
More specifically, on the command-line run, Clp detects that it should solve the dual problem, so the matrix to factorize is much smaller (~6k x 6k), and no error is encountered.

Thus, I don't know whether it is due to (i) the command-line executable doing something more than the above code, or (ii) something different happens in the _jll build (which would be surprising).
I'll try to replicate this from the C API, and I'll open an issue upstream if relevant.

@mtanneau
Copy link
Contributor Author

mtanneau commented Jun 5, 2020

I have bad/good news (depending on the perspective): the issue seems to come from upstream.

I tried the same sequence of C calls with Clp directly, and got the same error.

FYI: coin-or/Clp#151

@mtanneau
Copy link
Contributor Author

Reproducing J. Forrest's answer here:

I have modified Clp/master so that it recognizes that barrier will use too much memory and switch to simplex. [...]
The dualize option is not available in the C interface.

Since it comes from Clp itself and not Clp.jl, I guess this issue can be closed.

@odow odow closed this as completed Jun 16, 2020
@odow
Copy link
Member

odow commented Jun 16, 2020

👍 At some point we will pull in a new Clp with the fix, but this seems like an uncommon edge case we don't need to leave this issue open for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

3 participants