-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfaults #14
Comments
Thanks for the report. I've never seen this, so if you really have no objections it would be useful for me to try your code. It sounds like you've tried the obvious things. Did it used to work without a segfault, or is it simply that you've only been using This might be an OSX vs Linux thing, of course. Since I was writing it on Linux I went with its most modern timer API, then later learned that it's not available on BSD. But Linux can also use the same older timer API as OSX. So if I can't reproduce the segfault I'll compile a special version and test the OSX timers. I did that when I first developed the OSX support, but I bet I didn't give it the same kind of thorough workout you're giving it. |
Sure, the code is here: https://gist.github.com/blakejohnson/5600044 Running this causes my machine to segfault:
|
I'll definitely have to try the BSD timers. I was able to run that last line ~6 times without issue. Hopefully this weekend. |
I'll also have a look on another computer to see if it is something --Blake On Fri, May 17, 2013 at 3:37 PM, Tim Holy [email protected] wrote:
|
FWIW, here is an example backtrace upon segfault:
|
Ok, I can confirm the issue on another Mac, this one with OS 10.8.2. The segfault backtrace is basically the same. |
OK, I commented out the Linux timers and rebuilt. I still don't get the segfault. Then I ran it under valgrind as I got these errors: See also: I don't know what to do here. Looks like it could be a bug in libunwind. CCing @JeffBezanson. |
I also get the feeling that it is a libunwind bug. |
On the Mac, valgrind crashes before it finishes loading the Julia REPL. So, I can't show similar output... |
Hmm, that's not good! Sounds like one of us should submit a bug report to libunwind. Do you do C? |
Right. Reducing this to a bug report that doesn't require Julia is the trick. Unfortunately, it also looks like Apple's updates to libunwind have not been accepted upstream. So, do we submit a bug with Apple or the libunwind project? Ugh. |
I believe the reason libunwind is calling |
Dunno. I assume we get "ours" straight from libunwind? In that case, I'd |
@JeffBezanson, thanks for checking. Obviously it might be more informative to run it on the mac, if valgrind weren't crashing. |
It turns out that we only build libunwind on linux and FreeBSD. So, this must be a bug in Apple's implementation. I think Apple still runs their radar bug tracker. So, I guess I will file there. |
Thanks for continuing to pursue this! Profile users will owe you their thanks. |
CC @ViralBShah. |
We can certainly build a libunwind on OS X, if there is a patched version that works. |
So, I've been looking more into this over the last few days. It's difficult to debug because libunwind is very low-level. I've only really unearthed two new pieces of information:
|
That does sound like progress. Viral, you know a lot about Julia's build process, is this theory likely? |
I do not have any more insight here. Since gfortran is provided by the user, I can imagine that different builds link to different libraries. I personally have now moved towards using
OpenBLAS does lots of tricks under the hood, and perhaps @xianyi can tell us if it is possible that it is likely to interfere with profiling. Also, the last I had checked (probably a year ago, when we got backtrace), the upstream |
I notice that libprofile is built on gcc with mac, instead of with clang. Could this be an issue? |
I built with clang and still get the segfault. |
If you then do I'll check what this looks like on linux. |
On Ubuntu 12.04:
So, both libgfortran and libjulia link to libgcc_s.so.1. This is at least different from the mac. |
That doesn't seem to be a problem to me. The gfortran is not provided with XCode and hence it brings its own version of |
Alright, I guess it is time to look in a different direction. My test script spends most of its time in OpenBLAS, so perhaps it is just a coincidence that that I never seen it crash outside of OpenBLAS. I'll try some of the perf2 demos that are pure Julia and see if I can get the profiler to crash. |
Well, I cannot get the laplace or Go benchmarks to segfault during profiling. So, perhaps OpenBLAS or ccall are important. |
@xianyi Any wisdom from you here will be useful. This is a blocker for having a sampling profiler as part of julia base, since it crashes when openblas is executing. |
Hi @blakejohnson , @ViralBShah , Could you try the latest develop branch? We fixed some bugs in level-2 BLAS. Is it multi-threaded OpenBLAS? Could you try the single thread? Or, build OpenBLAS with USE_OPENMP=1 Xianyi |
Potential culprit: JuliaLang/julia#3365 Perhaps the fortran interface inside OpenBLAS needs to be compiled with |
But the segfaults on OSX are not specific to OpenBLAS. On Linux, the "unrooted" backtraces are also independent of OpenBLAS. |
Blake, when you link against OpenBLAS, are all the backtraces truncated, or only some of them? For example, I'm guessing if the backtrace is triggered while in Even though things are still rather confusing, I'm wondering if the time has come to report the issue with the libunwind developers. Perhaps they might immediately know what's happening and help direct our investigations. I'm not sure whether the mailing list or the bug reporter is the better choice. My suspicion is that the mailing list would be better, since I think there's still a lot we don't understand about this issue. |
Blake, if you also think that the time has come to report it, do you want to do it or should I? I'm happy either way. |
If you wouldn't mind contacting the libunwind developers, that would be great. They may or may not be able to help given the disconnect between their code and Apple'. But, they can probably point us in productive directions. I want to try to extend my example a little further to include calls to a simple external library. Then, I think this will be ready to file with Apple. |
I should mention my null result from last night: I obtained a proper gcc (rather than use the llvm-gcc that Apple provides) and tried building my test with that to at least rule out some varieties of incompatible calling conventions between gfortran and LLVM. No change, regardless of optimization level or |
Now I am back to thinking that there is an issue with how OpenBLAS is compiled. I've updated my gist with two new targets: The 4th target uses a simple C version of daxpy. I get full backtraces here. The 5th target compiles NETLIB's daxpy with gfortran and calls that directly. This also gives me full backtraces. So, OpenBLAS must be doing something different. @xianyi ? |
OpenBLAS uses the assembly kernel for daxpy. |
I see... these assembly methods manually encode the ABI prologue and epilogue. Hmm. Maybe I should choose a method implemented in C. |
I was hoping that perhaps we were beginning to discover that assembly methods were lacking unwind information and causing your segfault. But |
@blakejohnson, can you trigger the segfault from pure-Julia, using the |
After spending a couple hours trying to learn about calling conventions in x86_64, it seems like the assembly methods in OpenBLAS always omit the frame pointer. We could try modifying the definitions of PROLOGUE and EPILOGUE in OpenBLAS/common_x86_64.h to include:
and
|
@timholy I cannot seem to trigger a segfault with your |
Looking into this a little further, we may be able to just add |
Well, no joy on my simple modification of PROLOGUE and EPILOGUE. |
Good try. This is a toughie! I think I'm going to try the whole |
Hi @blakejohnson , When you added the following instruction.
Did you use -fno-omit-frame-pointer flag? Xianyi |
@xianyi I haven't tried that code snippet yet, just the |
@blakejohnson, it's possible that turning on libunwind debugging output might help debug these segfaults. See the procedure in JuliaLang/julia#3469 (comment) |
Thanks, @timholy. I'll see if Apple's libunwind supports similar debug flags. |
We are facing similar issues in Crystal. We also believe it's a bug in libunwind and LLVM. We found this: http://llvm.org/bugs/show_bug.cgi?id=20800 We applied the patch and use those binaries and it seems to be working fine, but now we tripped into this again so there might be more bugs. So I'm almost sure it has nothing to do with BLAS. The worse thing is that it's hard to reproduce, for a given program it always crashes but we only get that with the compiler's source, and with small changes to the compiler's source it goes away or comes back so it's hard to find a minimal case. |
BTW, it only happens on OSX. On Linux it always works fine. |
I don't know whether this is still a problem for us (I can't test, since I don't have a Mac). FYI, this profiler moved into julia base, and a potentially-relevant thread (very long, but with a happy resolution) is JuliaLang/julia#3469. I suspect it's a separate issue, but I thought you should know about it . |
I keep running into an issue where running
@sprofile
on the same function several times causes Julia to segfault. I am not even sure where to start in finding the origin of this issue. I am running Mac OS 10.7.5. I rebuild Julia daily, and this problem has existed at least for a few weeks. I also rebuilt the Profile.jl library withPkg.runbuildscript("Profile")
.If it would help to post the code I am running, I would be happy to do so.
The text was updated successfully, but these errors were encountered: