Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SuiteSparse segfaults on PPC64le #20123

Closed
staticfloat opened this issue Jan 19, 2017 · 36 comments
Closed

SuiteSparse segfaults on PPC64le #20123

staticfloat opened this issue Jan 19, 2017 · 36 comments
Labels
sparse Sparse arrays system:powerpc PowerPC

Comments

@staticfloat
Copy link
Member

We haven't been running the testsuite on PPC64le so far, so when I started doing so on the next iteration of the buildbots, I noticed that we don't actually make it through the sparse/sparse tests:

julia> Base.runtests("sparse/sparse")
Test (Worker)     | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB)

signal (11): Segmentation fault
while loading /home/juliabuild/buildbot/slave/package_tarballppc64le/build/usr/share/julia/test/sparse/sparse.jl, in expression starting on line 1466
umfzl_create_element at /home/juliabuild/buildbot/slave/package_tarballppc64le/build/usr/bin/../lib/libumfpack.so (unknown line)
umfzl_kernel at /home/juliabuild/buildbot/slave/package_tarballppc64le/build/usr/bin/../lib/libumfpack.so (unknown line)
umfpack_zl_numeric at /home/juliabuild/buildbot/slave/package_tarballppc64le/build/usr/bin/../lib/libumfpack.so (unknown line)
umfpack_numeric! at ./sparse/umfpack.jl:240
lufact at ./sparse/umfpack.jl:148
factorize at ./sparse/linalg.jl:897
normestinv at ./sparse/linalg.jl:552
cond at ./sparse/linalg.jl:534
macro expansion; at /home/juliabuild/buildbot/slave/package_tarballppc64le/build/usr/share/julia/test/sparse/sparse.jl:1474 [inlined]
macro expansion; at ./test.jl:842 [inlined]
anonymous at ./<missing> (unknown line)
jl_call_method_internal at /home/juliabuild/buildbot/slave/package_tarballppc64le/build/src/julia_internal.h:248 [inlined]
jl_toplevel_eval_flex at /home/juliabuild/buildbot/slave/package_tarballppc64le/build/src/toplevel.c:652
jl_parse_eval_all at /home/juliabuild/buildbot/slave/package_tarballppc64le/build/src/ast.c:756
jl_load at /home/juliabuild/buildbot/slave/package_tarballppc64le/build/src/toplevel.c:679
jl_load_ at /home/juliabuild/buildbot/slave/package_tarballppc64le/build/src/toplevel.c:686
include_from_node1 at ./loading.jl:532
unknown function (ip: 0x3fff71f325f3)
jl_call_method_internal at /home/juliabuild/buildbot/slave/package_tarballppc64le/build/src/julia_internal.h:248 [inlined]
jl_apply_generic at /home/juliabuild/buildbot/slave/package_tarballppc64le/build/src/gf.c:2214
include at ./sysimg.jl:14
macro expansion; at /home/juliabuild/buildbot/slave/package_tarballppc64le/build/usr/share/julia/test/testdefs.jl:13 [inlined]
macro expansion; at ./test.jl:842 [inlined]
macro expansion; at ./util.jl:288 [inlined]
macro expansion; at /home/juliabuild/buildbot/slave/package_tarballppc64le/build/usr/share/julia/test/testdefs.jl:0 [inlined]
anonymous at ./<missing> (unknown line)
jl_call_method_internal at /home/juliabuild/buildbot/slave/package_tarballppc64le/build/src/julia_internal.h:248 [inlined]
jl_toplevel_eval_flex at /home/juliabuild/buildbot/slave/package_tarballppc64le/build/src/toplevel.c:652
jl_toplevel_eval_in at /home/juliabuild/buildbot/slave/package_tarballppc64le/build/src/builtins.c:614
eval at ./boot.jl:238
unknown function (ip: 0x3fff71f0df97)
jl_call_method_internal at /home/juliabuild/buildbot/slave/package_tarballppc64le/build/src/julia_internal.h:248 [inlined]
jl_apply_generic at /home/juliabuild/buildbot/slave/package_tarballppc64le/build/src/gf.c:2214
runtests at /home/juliabuild/buildbot/slave/package_tarballppc64le/build/usr/share/julia/test/testdefs.jl:16
#483 at ./multi.jl:1053
run_work_thunk at ./multi.jl:1024
unknown function (ip: 0x3ffd48d146d3)
jl_call_method_internal at /home/juliabuild/buildbot/slave/package_tarballppc64le/build/src/julia_internal.h:248 [inlined]
jl_apply_generic at /home/juliabuild/buildbot/slave/package_tarballppc64le/build/src/gf.c:2214
#remotecall_fetch#488 at ./multi.jl:1078
jl_call_method_internal at /home/juliabuild/buildbot/slave/package_tarballppc64le/build/src/julia_internal.h:248 [inlined]
jl_apply_generic at /home/juliabuild/buildbot/slave/package_tarballppc64le/build/src/gf.c:2214
jl_apply at /home/juliabuild/buildbot/slave/package_tarballppc64le/build/src/julia.h:1413 [inlined]
jl_f__apply at /home/juliabuild/buildbot/slave/package_tarballppc64le/build/src/builtins.c:556
remotecall_fetch at ./multi.jl:1078
jl_call_method_internal at /home/juliabuild/buildbot/slave/package_tarballppc64le/build/src/julia_internal.h:248 [inlined]
jl_apply_generic at /home/juliabuild/buildbot/slave/package_tarballppc64le/build/src/gf.c:2214
jl_apply at /home/juliabuild/buildbot/slave/package_tarballppc64le/build/src/julia.h:1413 [inlined]
jl_f__apply at /home/juliabuild/buildbot/slave/package_tarballppc64le/build/src/builtins.c:556
#remotecall_fetch#492 at ./multi.jl:1106
jl_call_method_internal at /home/juliabuild/buildbot/slave/package_tarballppc64le/build/src/julia_internal.h:248 [inlined]
jl_apply_generic at /home/juliabuild/buildbot/slave/package_tarballppc64le/build/src/gf.c:2214
jl_apply at /home/juliabuild/buildbot/slave/package_tarballppc64le/build/src/julia.h:1413 [inlined]
jl_f__apply at /home/juliabuild/buildbot/slave/package_tarballppc64le/build/src/builtins.c:556
remotecall_fetch at ./multi.jl:1106
jl_call_method_internal at /home/juliabuild/buildbot/slave/package_tarballppc64le/build/src/julia_internal.h:248 [inlined]
jl_apply_generic at /home/juliabuild/buildbot/slave/package_tarballppc64le/build/src/gf.c:2214
macro expansion at /home/juliabuild/buildbot/slave/package_tarballppc64le/build/usr/share/julia/test/runtests.jl:65 [inlined]
#38 at ./task.jl:332
unknown function (ip: 0x3ffd48d10e17)
jl_call_method_internal at /home/juliabuild/buildbot/slave/package_tarballppc64le/build/src/julia_internal.h:248 [inlined]
jl_apply_generic at /home/juliabuild/buildbot/slave/package_tarballppc64le/build/src/gf.c:2214
jl_apply at /home/juliabuild/buildbot/slave/package_tarballppc64le/build/src/julia.h:1413 [inlined]
start_task at /home/juliabuild/buildbot/slave/package_tarballppc64le/build/src/task.c:261
jl_set_base_ctx at /home/juliabuild/buildbot/slave/package_tarballppc64le/build/src/task.c:282
Allocations: 48258430 (Pool: 48252559; Big: 5871); GC: 260
ERROR: A test has failed. Please submit a bug report (https://github.com/JuliaLang/julia/issues)
including error messages above and the output of versioninfo():
Julia Version 0.6.0-dev.2143
Commit f4cbc40 (2017-01-19 01:42 UTC)
Platform Info:
  OS: Linux (powerpc64le-unknown-linux-gnu)
  CPU: unknown
  WORD_SIZE: 64
  BLAS: libopenblas (NO_AFFINITY POWER8)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, pwr8)

Looks like it's pointing to this block of code as the culprit, but it doesn't trigger when I just run that piece of code manually. Reporting here in case anyone else wants to take a shot at this.

@tkelman tkelman added system:powerpc PowerPC sparse Sparse arrays labels Jan 19, 2017
@tkelman
Copy link
Contributor

tkelman commented Jan 19, 2017

We could attempt upgrading, which I haven't touched because I'm not sure how the better-late-than-never official upstream support for building shared libraries will interact with our home grown way of doing it.

@Keno
Copy link
Member

Keno commented Jan 19, 2017

The test suite definitely passed on power at some point.

@vchuravy
Copy link
Member

I definitely remember it passing in late September.

@staticfloat
Copy link
Member Author

staticfloat commented Jan 20, 2017

Interesting. It's the particular matrix that we're feeding in on ppc64le. My guess is that our initialization of Ac via sprand() had a different RNG state back then, and so we didn't stumble onto whatever corner case we're experiencing here.

julia> Ac = deserialize(open("Ac.jls"))
20×20 sparse matrix with 306 Complex{Float64} nonzero entries:
	[1 ,  1]  =  0.0+0.40613im
	[2 ,  1]  =  0.337806+0.0im
	[4 ,  1]  =  0.0-0.0846339im
	[5 ,  1]  =  -1.63287-0.13798im
	[6 ,  1]  =  0.0-1.02557im
	[7 ,  1]  =  -0.44613+1.23125im
	[8 ,  1]  =  0.19799+0.027402im
	[9 ,  1]  =  1.14926-2.03573im
	[11,  1]  =  0.298043+0.0im
	[12,  1]  =  1.02148+0.575466im
	[13,  1]  =  0.0+1.31344im
	[15,  1]  =  0.131325+0.0im
	[16,  1]  =  0.0-0.107559im
	[17,  1]  =  0.206393+0.0im
	⋮
	[2 , 20]  =  -1.25471+0.0im
	[3 , 20]  =  1.81056-0.4281im
	[4 , 20]  =  -0.697832+1.29197im
	[5 , 20]  =  0.0-0.610726im
	[7 , 20]  =  0.0-0.969805im
	[9 , 20]  =  2.0124+0.0862868im
	[11, 20]  =  0.992998+0.772603im
	[13, 20]  =  0.260153+0.0im
	[14, 20]  =  -1.39448-1.20073im
	[15, 20]  =  0.0+0.832971im
	[16, 20]  =  -0.723773+0.0im
	[17, 20]  =  -0.145429+1.13026im
	[18, 20]  =  -0.93486+0.0im
	[19, 20]  =  0.194438-2.05935im
	[20, 20]  =  0.0-1.3191im

julia> cond(Ac, 1)

signal (11): Segmentation fault

The serialized Ac is available for download here, I'm going to build a debug SuiteSparse and see if that gives us any more info.

@ViralBShah
Copy link
Member

I definitely remember the tests passing too. Perhaps this is due to change in compilers or something?

@staticfloat
Copy link
Member Author

staticfloat commented Jan 20, 2017

Yes, I think this is indeed a compiler change. I've managed to narrow down that compiling SuiteSparse with -O2 works, but -O2 -ftree-slp-vectorize doesn't. The default is -O3, which includes -ftree-slp-vectorize.

I'm compiling with GCC 6.3.0, built from-source.

@ViralBShah
Copy link
Member

In general, I am more comfortable using -O2 as the default level of optimization and using higher levels only selectively. Should we just go to -O2 on only PPC or across the board?

@nalimilan
Copy link
Member

In general I don't think it's recommended to build software with -O3. Some Linux distributions even require package maintainers to prove it's really faster than -O2, since in many cases it's actually not.

@tkelman
Copy link
Contributor

tkelman commented Jan 20, 2017

This sounds similar to a gcc bug that we found when using musl libc on alpine linux a while back https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71505. That was causing an internal compiler error which I used creduce to shrink down from all of cholmod to

int a, b;
double c, d, e;
double *f;
void fn1() {
  for (;;) {
    e = f[a];
    d = f[a] = f[b];
    c = f[b + 1] = c;
    f[a] = f[a + 1] = d;
    f[b] = f[b + 1] = c;
  }
}

and it wound up due to different qsort behavior giving some out of bounds accesses in the gcc tree vectorizer. Does it work if you compile with clang at -O3 instead? If so I'd translate the test case to C and see how small creduce can shrink code that gives the right answer at -O2 but segfaults at -O3, then report as a gcc bug. If you're as lucky as I was the people who work on gcc's tree vectorizer might fix it fairly quickly.

@staticfloat
Copy link
Member Author

I haven't tried clang yet, but GCC 5.4.0 works just fine, so I'm definitely chalking this up to a compiler regression. I'm not sure I have time to do the creduce stuff right now, if someone else wants to take a stab at it, I'm happy to help.

I think using -O2 by default is reasonable. I'll open a PR adding a patch for suitesparse that does this on master and release-0.5.

@tkelman
Copy link
Contributor

tkelman commented Jan 20, 2017

and release-0.5.

don't waste the CI time, it's going to fail until I backport a set of fixes to get travis and appveyor working again on that branch

@tkelman
Copy link
Contributor

tkelman commented Jan 21, 2017

I meant don't open a separate PR against release-0.5, if there aren't conflicts then marking it backport pending is fine.

But anyway. Did you have a docker container that was able to do powerpc cross-compilation and run executables in qemu from a x86_64 host, or am I imagining things?

@staticfloat
Copy link
Member Author

staticfloat commented Jan 21, 2017

I recently added in cross-compilation in .travis.yml for openlibm. That's not super helpful though, I suggest using something like the multiarch/crossbuild docker images for true cross-compilation. I'm playing around with the concept of building a giant cross-compiler docker image, but running into the usual "cmake too old"/"gcc version incompatible with LLVM 3.9"/etc.. problems.

For example, I'm attempting to compile Julia for ppc64le on this crossbuild image via the following:

$ docker run -ti --privileged -e CROSS_TRIPLE=powerpc64le-linux-gnu multiarch/crossbuild bash

Note that you will need to do things like install m4, gfortran-powerpc64le-linux-gnu, newer versions of qemu-user-static and cmake from jessie-backports, you need to run with --privileged so that binfmt-support can load in the proper kernel settings (this might be something having to do with my apparmor setup, I haven't been able to figure out why I need to do this), and this STILL doesn't work totally.

@tkelman
Copy link
Contributor

tkelman commented Jan 21, 2017

I'm not asking for the sake of building Julia, but for reproducing the gcc bug.

@ranjanan
Copy link
Contributor

ranjanan commented Feb 3, 2017

Not sure what the state of this is but the SuiteSparse test passed for me on Power:

julia> Base.runtests("sparse/sparse")
Test (Worker)     | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB)


sparse/sparse (1) |  130.95  | 20.92  | 16.0 | 2083.50    | 471.25  

Test Summary: | Pass  Total
  Overall     | 1469   1469
    SUCCESS
julia> versioninfo()
Julia Version 0.6.0-dev.2548
Commit d7cff44 (2017-02-03 04:08 UTC)
Platform Info:
  OS: Linux (powerpc64le-linux-gnu)
  CPU: unknown
  WORD_SIZE: 64
  BLAS: libopenblas (NO_AFFINITY POWER8)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, generic)

gcc --version gives:

gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

@tkelman
Copy link
Contributor

tkelman commented Mar 4, 2017

if you translate this example code #20123 (comment) into its underlying ccall, does it still segfault? is there a way to reproduce this via qemu on a non power system?

@staticfloat
Copy link
Member Author

Linking the ccall-based MWE here as well. Yes, it does still segfault.

In my experiments with qemu-ppc64le-static, I have not been able to run any reasonably sized programs without it randomly segfaulting. All the non x86 Julia buildworker images have qemu-${ARCH}-static built in, so you can, e.g. run docker run -ti staticfloat/julia_workerbase_debian7_11:armv7l and have an ARMv7 build environment at your fingertips. Unfortunately, I can't even get past git clone on the ppc64le version of that. Perhaps you will have better luck.

@tkelman
Copy link
Contributor

tkelman commented Mar 5, 2017

And if you compile

#include <stdint.h>
int64_t umfpack_zl_symbolic(int64_t, int64_t, int64_t*, int64_t*, double*, double*, void**, double*, double*);
int64_t umfpack_zl_numeric(int64_t*, int64_t*, double*, double*, void*, void**, double*, double*);
int64_t colptr[] = {0,17,33,49,66,82,95,111,129,145,158,175,190,202,214,227,246,261,274,290,306};
int64_t rowval[] = {0,1,3,4,5,6,7,8,10,11,12,14,15,16,17,18,19,0,1,3,6,7,8,9,10,12,13,14,15,16,17,18,19,0,2,3,4,5,6,9,10,11,12,14,15,16,17,18,19,0,1,2,3,4,5,6,7,9,10,11,12,13,14,16,18,19,1,2,3,4,5,6,7,9,10,11,12,13,14,15,16,19,0,4,5,6,7,8,9,10,13,14,16,17,18,0,1,2,3,4,5,6,7,10,11,12,14,15,17,18,19,0,1,2,4,5,6,7,8,9,10,12,13,14,15,16,17,18,19,0,1,2,3,4,5,7,8,9,10,11,12,15,16,18,19,0,2,4,7,8,10,11,12,15,16,17,18,19,0,1,2,3,4,6,8,9,10,11,12,13,14,16,17,18,19,0,1,2,3,4,5,6,7,8,10,11,13,15,17,19,0,4,5,7,9,10,12,13,14,15,16,18,2,5,6,7,8,9,10,13,14,15,18,19,0,2,3,5,7,8,9,11,12,13,14,15,16,0,1,2,3,4,5,6,7,8,9,10,11,13,14,15,16,17,18,19,0,1,3,4,5,6,7,8,10,11,12,14,15,16,18,0,1,4,5,6,7,8,10,11,12,14,15,17,0,1,2,3,4,5,6,7,8,9,10,11,13,16,17,19,0,1,2,3,4,6,8,10,12,13,14,15,16,17,18,19};
double realval[] = {0.0,0.337806,0.0,-1.63287,0.0,-0.44613,0.19799,1.14926,0.298043,1.02148,0.0,0.131325,0.0,0.206393,-0.311376,2.58573,0.186837,1.38221,-0.393738,-2.00667,0.20917,0.0,0.792833,-0.97752,0.0,0.0,0.0,0.853601,-0.595753,-1.08044,0.379664,0.0,1.0999,1.21105,0.37784,-0.986784,-0.562703,0.163778,-0.0126376,-0.170773,0.71282,1.40237,1.0462,0.406037,-0.554043,0.0,-1.92096,0.0,0.0,0.939853,-0.213045,-0.511759,0.0,0.0,0.0,-0.227391,0.0,-1.0151,0.0,-0.477085,-0.32659,0.0,0.469942,1.64964,0.0,0.0,1.78145,-0.763605,0.0,0.0,0.0131958,0.0,-0.162672,-2.70516,-0.122919,-2.00008,-0.698962,1.09993,-1.7705,-2.00725,0.0,-0.998107,0.00869002,-0.163939,0.625094,0.0725784,0.0,-0.838007,-0.442824,0.41261,1.26817,0.69029,-1.41511,0.0187943,0.0,-0.468559,1.61116,0.0,-0.727772,0.523392,0.0,0.0,0.0,0.356031,-0.4948,0.202831,0.0,-1.39671,-0.540818,1.17547,1.2347,1.43989,0.0,0.138636,0.0,-1.14957,0.0245661,0.0,0.0,1.23577,1.28436,0.0,-1.62606,-0.780131,0.462515,0.0,-0.491959,0.0,0.0,-1.01862,0.198236,1.73314,1.29703,0.0,0.21179,0.744656,0.220431,-0.702916,0.749352,-0.742024,0.0,1.38957,0.0,0.0,-0.224411,0.0,0.0,-1.02532,0.0,0.0,0.0,0.966394,0.0367359,0.0,-0.796899,0.0611722,0.116889,0.366088,0.0,1.06849,0.0,0.0,-1.16406,1.55251,-0.362438,1.36879,-1.07224,-1.23966,0.0,0.461177,0.795476,3.06289,0.818629,0.857781,0.0,1.23709,0.0,-1.2936,-0.411399,0.279837,0.182068,0.0,0.0,0.0,0.198476,-0.0630913,-0.103114,0.71032,0.0,1.29788,0.0,1.07622,1.75651,0.458845,0.0,0.0475068,0.0,-0.25331,0.521942,0.0,-0.107332,0.0,-0.495542,0.0,-1.99779,0.769662,0.0,1.86222,-0.996834,0.0,0.0,0.0,0.654088,0.0,1.20758,0.0109424,-0.372857,-0.468379,0.40064,0.0,0.344445,0.448019,0.0,0.0,1.02937,-0.370604,-1.27289,0.0,0.0,-0.171546,0.0,-0.720716,-0.633376,0.0,-2.0201,2.09947,0.0,0.0,0.0,0.0,0.550492,-1.92742,0.645865,0.0197031,-0.657945,0.467725,-0.278258,0.0,1.14977,0.0,0.49476,0.0,0.876636,2.16184,0.0,1.29705,0.0,-1.35523,-0.663317,-1.22075,-0.0638587,0.666348,0.196576,-1.62458,-0.846266,0.633132,0.589679,0.0,-0.00264666,1.60007,0.782907,1.22297,-0.722215,-0.923931,0.0,0.0,0.349859,0.0,1.48932,0.0865408,0.0,-1.83066,-1.07821,0.0,-0.58498,0.0,0.471699,-0.664718,-0.150759,-0.284796,-0.386488,-1.25471,1.81056,-0.697832,0.0,0.0,2.0124,0.992998,0.260153,-1.39448,0.0,-0.723773,-0.145429,-0.93486,0.194438,0.0};
double imagval[] = {0.40613,0.0,-0.0846339,-0.13798,-1.02557,1.23125,0.027402,-2.03573,0.0,0.575466,1.31344,0.0,-0.107559,0.0,0.251055,0.0,0.736455,0.0,0.0,0.0,-0.318469,0.510821,-1.16124,-2.0675,-0.935379,-0.984719,-0.0427068,0.0,0.0,-1.04169,0.0,-0.721603,1.76978,0.0,0.0,-1.6339,0.171513,1.07942,-0.341863,-0.451416,0.0,-0.385623,-0.177979,0.0,-0.527409,0.274356,-1.40177,-0.92233,-0.897793,0.0,0.741349,0.12676,-0.14502,0.167712,-1.86289,0.580561,-0.0656835,0.580498,0.0601529,0.0,0.0,0.330769,0.0,0.0,0.244537,-0.876682,0.156848,0.0,1.27292,-2.54138,0.0,-0.235694,2.0349,0.148812,0.0,0.0,-0.871853,1.0887,0.0,-0.186493,0.352587,0.0,0.0,0.0,0.0,1.20885,-0.997256,0.0,-0.774319,-1.01799,-0.250655,0.0,0.581978,0.0,-1.99611,0.804366,0.0,1.83116,0.293086,-0.645082,0.246694,-1.06371,-1.71633,0.0,0.0,0.662097,-0.0167129,-0.0591024,0.0,0.0,0.0,0.0,-0.253807,0.756052,-0.237708,0.0,0.0,-1.41261,0.546773,-0.656529,0.0,-0.054959,1.00811,0.189775,0.0,-1.65752,0.0,-0.278691,-0.512486,0.0,-1.76222,-0.398593,-0.990992,2.09176,-0.524177,0.0,0.182589,0.0,-1.66172,0.914951,-0.801739,0.554185,0.72665,-2.421,0.926461,1.46664,0.956341,1.2396,-1.25235,1.21628,1.03836,-1.04291,-1.02315,-0.180262,-0.211879,0.0,0.0,0.0,-2.02002,0.0,0.301225,1.88696,0.0,1.11314,-0.663909,-0.812546,0.0,-1.01952,-0.261015,0.0,-1.65223,-0.209834,0.0,0.755999,-0.33215,0.368023,0.136506,0.0,-0.259282,0.0,2.31561,-0.560358,0.0308454,1.12144,-0.997661,-0.118676,-1.72015,0.24544,-1.45935,0.0,-0.900858,0.408868,-0.50157,-0.694633,0.728268,0.0,2.23729,0.297442,1.11827,0.605953,0.128544,0.75236,0.0,-0.177461,-0.943643,0.999748,-0.344966,0.0,0.0,-2.09372,-1.05075,-0.00600076,0.0,-1.54667,0.425878,-0.0684746,0.0,0.0,0.0,-1.14376,0.0,-0.15198,-1.11284,-0.779306,0.0,0.0,-1.33519,1.21656,-0.573046,0.0,1.78693,0.0,0.0,1.89433,0.243233,0.0,-0.0222436,-0.0812604,0.209523,-0.779886,0.0,0.878099,0.0,0.0,0.0,0.0,0.0,-0.709713,0.0,1.21249,0.0238692,0.86202,0.0,0.0,-1.10935,0.489313,-0.179003,-0.834192,0.0,-0.338043,0.0,0.115793,0.0,1.12293,0.0,-1.03264,0.330903,0.407553,1.27351,0.0,0.0,0.1353,0.0,0.0,0.41193,-1.08955,0.0,0.0454245,0.0,0.419015,-1.86116,0.0,0.0,-0.271129,-0.281632,0.663675,0.0,-1.00765,0.842544,0.0,-1.02119,0.0,-0.4281,1.29197,-0.610726,-0.969805,0.0862868,0.772603,0.0,-1.20073,0.832971,0.0,1.13026,0.0,-2.05935,-1.3191};
double umf_ctrl[] = {1.0,0.2,0.2,0.1,32.0,0.0,0.7,2.0,1.0,0.0,1.0,1.0,0.0,0.0,10.0,0.001,1.0,0.5,0.0,1.0};
double umf_info[] = {2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,2.32446e-310,0.0};
void* tmp;
void* symbolic;
int main() {
umfpack_zl_symbolic(20,20,colptr,rowval,realval,imagval,&tmp,umf_ctrl,umf_info);
symbolic = tmp;
return umfpack_zl_numeric(colptr,rowval,realval,imagval,symbolic,&tmp,umf_ctrl,umf_info);
}

linked against Julia's copy of umfpack, does that also segfault for binaries built before the patch, run successfully after? Setting QEMU_CPU=POWER8 allows some things to run, but I haven't quite gotten the O2 version to succeed.

@staticfloat
Copy link
Member Author

Yes, it does. Nice job shrinking that down. gdb gives the following backtrace, which looks like it's the same problem:

Program received signal SIGSEGV, Segmentation fault.
0x00003fffb7f30ca8 in umfzl_create_element () from /src/julia/usr/lib/libumfpack.so
(gdb) bt
#0  0x00003fffb7f30ca8 in umfzl_create_element () from /src/julia/usr/lib/libumfpack.so
#1  0x00003fffb7f333d4 in umfzl_kernel () from /src/julia/usr/lib/libumfpack.so
#2  0x00003fffb7f4745c in umfpack_zl_numeric () from /src/julia/usr/lib/libumfpack.so
#3  0x00000000100008a8 in main ()

I've updated the docker container to build this C library as well (and run it by default) and added your C code to the gist.

@tkelman
Copy link
Contributor

tkelman commented Mar 5, 2017

Now can we take the compiler invocation that builds libumfpack and use -E to get a (large) single file version of this? then we can run it through creduce and get something small and self contained enough to report upstream.

@tkelman
Copy link
Contributor

tkelman commented Mar 7, 2017

Bump. You still owe me a gcc bug report.

@staticfloat
Copy link
Member Author

staticfloat commented Mar 8, 2017

I spent some time today trying to do this, but it's hard to disentangle the rest of libsuitesparse from this invocation (The calls we're using sub out to things in libamd as well as things like umfpack_malloc, umfpack_tictoc, etc...), and unfortunately gcc -E doesn't eliminate duplicate definitions, so just naively running gcc -E *.c > huge.c doesn't work, because things like umfpack_internal.h get copied in more than once. I'm open to teaching on how to do this. I don't think it should be that complicated (we know where the error is occurring thanks to our backtrace) but I still don't know how to get the source into a form where creduce can do anything with it.

You still owe me a gcc bug report.

I get that you don't like leaving loose ends lying around, and that this PR was pushed in against your wishes, but I've been pretty vocal about my time constraints and how much time/effort I'm willing to put into trying to get this communicated upstream to the gcc folks. The only reason I'm continuing on this issue is because I have a lot of respect for you and your opinions when it comes to the "right" way to do software. Please don't add any more stress to my inbox; you can feel free to do that if/when I get my paychecks from Julia Computing. ;)

@tkelman
Copy link
Contributor

tkelman commented Mar 8, 2017

Does this happen when linked to system blas with 32 bit ints? That would remove the need to deal with openblas from reproducing this. If Viral or anyone else who cares about this architecture wants to make sure the patch doesn't get removed, we'll need to reduce it as much as we can.

@staticfloat
Copy link
Member Author

I managed to get all of UMFPACK spat out into .i files, and with those and the relevant .a files, I've got creduce crunching down the .i files and your .c file. Everything's backed up here.

@tkelman
Copy link
Contributor

tkelman commented Mar 8, 2017

With luck it won't have to use any of the .as in what it eventually gets down to. https://github.com/tkelman/docker-power-suitesparse/blob/master/build.sh has a lot of the unnecessary parts trimmed out for linking the c test case, does that still segfault?

@tkelman
Copy link
Contributor

tkelman commented Mar 8, 2017

Looking at our makefiles, we actually don't use ILP64 on Power by default. Maybe we should change that at some point.

If I have suitesparse built statically into the test executable instead of linking it as a shared library, qemu is capable of reproducing the problem from amd64. But it only happens using openblas (either the debian ppc64el package, or out of a Julia binary), not debian's ppc64el reference blas package. I wonder if maybe openblas could have a memory corruption bug in its power kernels, and changing the optimization level of umfpack results in a different memory access pattern or something?

@staticfloat
Copy link
Member Author

I took your reduced case and started another run based off of it on the ppc64le machine.

@staticfloat
Copy link
Member Author

staticfloat commented Mar 8, 2017

Looking at our makefiles, we actually don't use ILP64 on Power by default. Maybe we should change that at some point.

Agreed, I'll open an issue about that. And here's the PR.

@tkelman
Copy link
Contributor

tkelman commented Mar 8, 2017

Also worth testing this against the latest development branch of openblas, given how long it's been since their last release.

@staticfloat
Copy link
Member Author

I took your reduced case and started another run based off of it on the ppc64le machine.

creduce, which has been running all day, has gone down from 8M LOC to 25K LOC. We're getting close.

@staticfloat
Copy link
Member Author

Creduce was a little too zealous and reduced down to the simplest possible segfault; it just calls an uninitialized function pointer.

@tkelman
Copy link
Contributor

tkelman commented Mar 9, 2017

It's best to compile and run (under a timeout) both working and broken copies in the test script. My test script had a bug where I typoed the segfaulting version so it reduced to an empty main, whoops.

@staticfloat
Copy link
Member Author

Yeah good idea with ensuring that the non-optimized version still runs, I'm rerunning it with two versions, and also not reducing mwe.c so it retains some small resemblance to the original problem. If this doesn't work, I will go the extra step and actually output the result, then constrain creduce to keep the algorithm working

@staticfloat
Copy link
Member Author

I am running a new creduce run that preserves the algorithm (by taking the results via umfpack_zl_get_numeric() to get the LU factorization results, then writing them out to disk and ensuring they're the same for the -O0 run, because creduce was getting stuck in local minima where the -O0 run exited early due to malformed matrices hitting early exit() points.

@staticfloat
Copy link
Member Author

It turns out, this was due to some misbehaving ppc64le kernels within OpenBLAS. Using the latest OpenBLAS head (99880f79068fc12b3025840671a838f0d4be3c9e at time of writing) fixes this issue completely.

@tkelman
Copy link
Contributor

tkelman commented Mar 18, 2017

Great, thanks for testing that and sorry for being a bit of a jerk about it. If the fix can be reverse-bisected to a power specific commit that would apply cleanly to the last release, maybe we should swap patches. if not, we should drop the suitesparse patch when upgrading to the next openblas version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sparse Sparse arrays system:powerpc PowerPC
Projects
None yet
Development

No branches or pull requests

7 participants