Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make ccall sigatomic (defer SIGINT handling) #2622

Closed
stevengj opened this issue Mar 20, 2013 · 32 comments
Closed

make ccall sigatomic (defer SIGINT handling) #2622

stevengj opened this issue Mar 20, 2013 · 32 comments
Labels
bug Indicates an unexpected problem or unintended behavior needs decision A decision on this change is needed

Comments

@stevengj
Copy link
Member

Currently, a ctrl-c in the REPL can interrupt a ccall to an external library, leaving that library in an inconsistent state and producing crashes or other problems later (see e.g. PyCall issue #15).

Julia's C API (julia.h) has SIGATOMIC macros to protect critical code regions from being interrupted by ctrl-c (SIGINT), but these are not easily accessible from Julia.

Arguably, every ccall should be made sigatomic by default, on the theory that C code in general is not interrupt-safe (with a separate interruptable_ccall for cases that are known to be re-entrant and interrupt-safe). Note that Python does the same thing. As noted below, the performance penalty for this is almost certainly negligible under ordinary circumstances.

@JeffBezanson
Copy link
Member

see also #1468

@timholy
Copy link
Member

timholy commented Mar 20, 2013

In code that repeatedly calls Cairo, I've also seen issues with Ctrl-C.

@stevengj
Copy link
Member Author

Making ccall sigatomic by default seems like the only safe choice. The overhead for this looks like it would be two additions, three loads, two stores, and two comparisons/branches in a typical call; this is negligible for all but the most trivial C function calls.

Moreover, Julia's JIT compiler need not generate the sigatomic checks at all unless jl_install_sigint_handler has been called, which (on a single core) only happens in the REPL by default, so this wouldn't affect non-interactive code.

(One of the comparisons could be eliminated in typical calls by changing the conditional from jl_defer_signal == 0 && jl_signal_pending != 0 to jl_signal_pending != 0 && jl_defer_signal == 0 since both jl_signal_pending and jl_defer_signal will usually be zero.)

@stevengj
Copy link
Member Author

I'm labelling this a bug, since crashing generic C code after SIGINT is not a sane default.

@JeffBezanson
Copy link
Member

This is reasonable. Unfortunately (as in #1468) we'd also like to be able to interrupt things like large matrix multiplies, but there doesn't seem to be much we can do about that on our end.

@kmsquire
Copy link
Member

@JeffBezanson in #5399:

I'm kind of against this since it adds overhead and probably still won't make everything work.

It would be nice to measure the impact.

How about adding a safe_ccall or atomic_ccall function with the same signature as ccall, but wrapped in sigatomic?

Alternatively, ccall could be made safe by default and unsafe_ccall could be provided. Thoughts?

@StefanKarpinski
Copy link
Member

Alternatively, ccall could be made safe by default and unsafe_ccall could be provided.

This seems like the better approach to me if we're going to do this since we would want to default to safe and most cases where someone is calling out to C are not really that performance critical.

@ivarne
Copy link
Member

ivarne commented Jan 16, 2014

I also think safe-by-default suits Julia well. If we do not have other behavior to group together with interruptable ccalls in a unsafe_ccall, I think a name containing int(errupt) would have advantages.

@JeffBezanson
Copy link
Member

I just don't think asynch interrupt is that important. Plus, you might want to break out of a long-running C routine.

This doesn't warrant a safe_ or unsafe_ prefix; it only refers to a very narrow issue of interactively interrupting things. With that approach, the very simplest and most innocent of C routines (e.g. libm functions) would have to be called with unsafe_ccall, which doesn't seem right.

We are already doing delayed binding of ccalls by checking a cached function pointer for NULL. This would add yet another thing.

@stevengj
Copy link
Member Author

There's no point in allowing people to break out of long-running C routines if doing so has a high probability of creating crashes later; I doubt that most long-running C code is interrupt-safe.

I find it hard to believe that the overhead will be anything but negligible except for a very small number of extremely trivial C routines, and decorating these with unsafe_ccall seems like a small price to pay for not having ctrl-C be horribly dangerous.

@stevengj
Copy link
Member Author

(I would tend to call it interruptable_ccall rather than "unsafe", since the former is more informative.)

@kmsquire
Copy link
Member

Creating a branch which adds sigatomic_begin() and sigatomic_end() directly to ccall and measuring the impact would give us something more concrete to argue over. ;-)

I won't have time to try this for at least a few days, so if anyone is up to the task, I'm eager to hear about it...

@kmsquire
Copy link
Member

(And I agree that interruptable_ccall is more informative.)

@JeffBezanson
Copy link
Member

And then I think there is a question of whether it will even work --- I doubt everything in our runtime system and libraries will be completely interrupt-safe. Technically after an async signal the process may be in an undefined state, no matter what you do.

A lot of important native routines are very tiny, for example sqrt, whose body is basically one instruction. But yes, we should definitely try it before saying any more about performance.

I would prefer almost anything to having to go around and write interruptible_ccall in various places, plus clutter the language with this extra word. If people really want this, I'd rather make every ccall atomic by default and let it be disabled by a compiler switch.

@amitmurthy
Copy link
Contributor

Testing a wrapped ccall with these changes:

https://github.com/amitmurthy/julia/compare/amitm;sigatomic?expand=1

I get the following results (extreme case of just trying to detect overhead of sigatomic calls...)

On master:

julia> @time map!(x->sqrt(x), [1.0:float64(10^8)]);
elapsed time: 7.719709771 seconds (5599993224 bytes allocated)

julia> @time map!(x->sqrt(x), [1.0:float64(10^8)]);
elapsed time: 7.726340217 seconds (5599993224 bytes allocated)

julia> @time map!(x->sqrt(x), [1.0:float64(10^8)]);
elapsed time: 7.697574462 seconds (5599993224 bytes allocated)

with wrapped ccall:

julia> @time map!(x->sqrt(x), [1.0:float64(10^8)]);
elapsed time: 7.865826205 seconds (5599993128 bytes allocated)

julia> @time map!(x->sqrt(x), [1.0:float64(10^8)]);
elapsed time: 7.825473527 seconds (5599993128 bytes allocated)

julia> @time map!(x->sqrt(x), [1.0:float64(10^8)]);
elapsed time: 7.870493742 seconds (5599993128 bytes allocated)

I think we should just make every ccall (and any other similar functions) atomic by default. Even a compiler switch may not be required.

Seg faults simply leave a bad taste in the mouth.....

@JeffBezanson
Copy link
Member

If I'm reading your patch correctly, I don't think this has any effect since the JL_SIGATOMIC_BEGIN happens at compile time. You'd need to generate code that does what JL_SIGATOMIC_BEGIN does.

@amitmurthy
Copy link
Contributor

Right. Should have realized from the "emit" names...

Just tested again by just ccall'ing the jl_sigatomic* functions before and after the ccall for sqrt in math.jl and I am getting only slightly increased timings....

julia> @time map!(x->sqrt(x), [1.0:float64(10^8)]);
elapsed time: 7.756323121 seconds (5599993224 bytes allocated)

julia> @time map!(x->sqrt(x), [1.0:float64(10^8)]);
elapsed time: 7.742497754 seconds (5599993224 bytes allocated)

julia> @time map!(x->sqrt(x), [1.0:float64(10^8)]);
elapsed time: 7.76467246 seconds (5599993224 bytes allocated)

Patch - https://github.com/amitmurthy/julia/compare/amitm;sigatomic?expand=1

This should have a runtime effect, right?

@stevengj
Copy link
Member Author

Don't benchmark anonymous functions. Write a foo!(X) function using an explicit loop.

Though honestly I don't see the point. The number of C functions as small as sqrt seems quite small, so I find it hard to believe that we'd use interrupt able_ccall in more than a dozen or so places.

@JeffBezanson
Copy link
Member

At least use sqrt instead of x->sqrt(x); that's adding an extra layer of overhead. Better still call the vectorized sqrt.

@amitmurthy
Copy link
Contributor

For

a=[1.0:float64(10^8)]
@time sqrt(a);

it is 1.06 seconds for the safe version and 0.59 seconds for the current version. Maybe the difference will be even smaller when emit_ccall itself generates sigatmic* code? In practical terms, @stevengj is right, we may as well just make ccall resilient to interrupts by default....

@tknopp
Copy link
Contributor

tknopp commented Jan 18, 2014

I can see Jeffs point very well. Julia is in an outstanding position where the ffi has almost zero overhead as the overhead is jited away. In all other languages I have seen, there is some overhead which prevents calling scalar C functions in tight loops. This usually leads to recommendation to use the ffi only for vector operations, where the overhead is negligible.

@stevengj
Copy link
Member Author

@amitmurthy, the difference will certainly be smaller when the sigatomic is inlined; it is much more expensive to do a subroutine call than a comparison.

@tknopp, the problem with this argument is that very very few C functions as simple as sqrt are worth calling in a tight loop. In most cases, if the C function is so trivial that the cost of an extra comparison and load are significant, then it is trivial enough to just inline the calculations and/or rewrite in Julia. In the few cases like sqrt that are basically wrappers around single instructions, then (a) we can use interruptable_ccall and (b) we should consider having Julia inline it anyway.

@JeffBezanson
Copy link
Member

What typically happens is somebody says "To try out julia, I compared a simple loop (c)calling sqrt in C++ and Julia, and Julia was 50% slower. Why is Julia so much slower for something so simple?" The C++ program, of course, just dies entirely on ^C, but that is not going to occur to anybody as a point of comparison.

From my perspective, we work very hard to generate good code, so it is disheartening when more stuff has to be crammed into the pipeline. Often each extra "thing" is 2%, but over time there are 20 of those things.

But, @stevengj is right that the overhead will be much less with the sigatomic stuff inlined, and in the specific case of sqrt we should be inlining it.

@stevengj
Copy link
Member Author

@JeffBezanson, we already get tons of I did [something naive] and Julia was X times slower than Y comments. Slowing down the extremely unusual case of calling your own trivial C function in a tight loop (as opposed to something like sqrt which is in Base and for which we will have already used interruptable_ccall) seems likely to have negligible impact, in comparison with more common mistakes, on first impressions.

@JeffBezanson
Copy link
Member

Ok, I think the way I want to handle this is to make ccall sigatomic by default, and have a compiler switch to control it. My reasoning is that whether you need ^C depends on whether you are working interactively, which is a fairly global property. When we reach the point of statically compiling whole programs, it makes sense to ask to strip out all sigatomic stuff.

@stevengj
Copy link
Member Author

@JeffBezanson, why do you need a compiler switch? Just turn off the sigatomic ccall in the code generator if jl_install_sigint_handler has not been called, as I suggested above. This will automatically disable it in non-interactive usage.

@mossr
Copy link

mossr commented Jan 17, 2015

@JeffBezanson is there a way to completely remove signal handling (other than the //jl_install_sigint_handler method)? You mentioned a compiler switch, has that been implemented?

Because MATLAB runs on the JVM, if a library installs it's own signal handler then complications can cause MATLAB to crash.

@stevengj
Copy link
Member Author

(Note that the sqrt function has been inlined for some time, since 244ec92, so ccall overhead will no longer affect it. Shouldn't we also be using LLVM intrinsics for log and exp and a few others?)

@ViralBShah
Copy link
Member

How do the LLVM intrinsics work? Do they end up calling the system libm, or does LLVM have fast implementations? We have @simonbyrne 's log implementation in Julia that is probably the fastest one for now.

@simonbyrne
Copy link
Contributor

From what I understand they just call the system libm functions, though they are also able to optimise repeated calls with the same argument (#414, #9942, #10922).

randy3k added a commit to JuliaInterop/RCall.jl that referenced this issue Mar 21, 2016
yuyichao added a commit that referenced this issue May 3, 2016
* Remove unnecessary sigatomic
* Make flisp calls sigatomic
* Make type inference calls sigatomic
* Refactor interthread communication through signal
* Make sure `sleep` is aborted on `SIGINT` on Linux to deliver the exception faster
* Implement force signal throwing when `SIGINT` arrives too frequently
* Hack to abort io syscall on `SIGINT`

Fix #1468; Fix #2622; Towards #14675
yuyichao added a commit that referenced this issue May 4, 2016
* Remove unnecessary sigatomic
* Make flisp calls sigatomic
* Make type inference calls sigatomic
* Refactor interthread communication through signal
* Make sure `sleep` is aborted on `SIGINT` on Linux to deliver the exception faster
* Implement force signal throwing when `SIGINT` arrives too frequently
* Hack to abort io syscall on `SIGINT`

Fix #1468; Fix #2622; Towards #14675
yuyichao added a commit that referenced this issue May 4, 2016
* Remove unnecessary sigatomic
* Make flisp calls sigatomic
* Make type inference calls sigatomic
* Refactor interthread communication through signal
* Make sure `sleep` is aborted on `SIGINT` on Linux to deliver the exception faster
* Implement force signal throwing when `SIGINT` arrives too frequently
* Hack to abort io syscall on `SIGINT`

Fix #1468; Fix #2622; Towards #14675
yuyichao added a commit that referenced this issue May 4, 2016
* Remove unnecessary sigatomic
* Make flisp calls sigatomic
* Make type inference calls sigatomic
* Refactor interthread communication through signal
* Make sure `sleep` is aborted on `SIGINT` on Linux to deliver the exception faster
* Implement force signal throwing when `SIGINT` arrives too frequently
* Hack to abort io syscall on `SIGINT`

Fix #1468; Fix #2622; Towards #14675
yuyichao added a commit that referenced this issue May 4, 2016
* Remove unnecessary sigatomic
* Make flisp calls sigatomic
* Make type inference calls sigatomic
* Refactor interthread communication through signal
* Make sure `sleep` is aborted on `SIGINT` on Linux to deliver the exception faster
* Implement force signal throwing when `SIGINT` arrives too frequently
* Hack to abort io syscall on `SIGINT`

Fix #1468; Fix #2622; Towards #14675
yuyichao added a commit that referenced this issue May 5, 2016
* Remove unnecessary sigatomic
* Make flisp calls sigatomic
* Make type inference calls sigatomic
* Refactor interthread communication through signal
* Make sure `sleep` is aborted on `SIGINT` on Linux to deliver the exception faster
* Implement force signal throwing when `SIGINT` arrives too frequently
* Hack to abort io syscall on `SIGINT`

Fix #1468; Fix #2622; Towards #14675
yuyichao added a commit that referenced this issue May 5, 2016
* Remove unnecessary sigatomic
* Make flisp calls sigatomic
* Make type inference calls sigatomic
* Refactor interthread communication through signal
* Make sure `sleep` is aborted on `SIGINT` on Linux to deliver the exception faster
* Implement force signal throwing when `SIGINT` arrives too frequently
* Hack to abort io syscall on `SIGINT`

Fix #1468; Fix #2622; Towards #14675
yuyichao added a commit that referenced this issue May 6, 2016
* Remove unnecessary sigatomic
* Make flisp calls sigatomic
* Make type inference calls sigatomic
* Refactor interthread communication through signal
* Make sure `sleep` is aborted on `SIGINT` on Linux to deliver the exception faster
* Implement force signal throwing when `SIGINT` arrives too frequently
* Hack to abort io syscall on `SIGINT`

Fix #1468; Fix #2622; Towards #14675
@stevengj
Copy link
Member Author

stevengj commented May 6, 2016

Hooray!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior needs decision A decision on this change is needed
Projects
None yet
Development

No branches or pull requests