Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: setup signal handlers in interpreted code #14766

Merged

Conversation

ysbaddaden
Copy link
Contributor

@ysbaddaden ysbaddaden commented Jun 29, 2024

Adds a couple interpreter primitives to handle signals properly between the interpreter (receives signals and forwards them to the interpreted code signal pipe) then the interpreted fiber will receive the signal (though the pipe) and handle it.

The interpreted code is thus in control of what signals it wants the process to receive, while avoiding the issue where a signal handler would be interpreted code (i.e. recipe for disaster).

Fixes the default handler for SIGCHLD into the interpreted code, so Process.run now works as expected and don't hang forever anymore. The interpreter will still randomly segfault because fork should probably be an interpreter primitive (see the comments below).

refs #12241

The interpreter won't receive signals anymore, since the interpreted
code will receive them, but the interpreter's signal loop fiber won't be
resumed and it won't handle signals anymore.

Unblocks the default handler for SIGCHLD into the interpreted code, so
Process.run and waiting for a sub-process works just as expected.
@ysbaddaden ysbaddaden added kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:compiler:interpreter labels Jun 29, 2024
@ysbaddaden ysbaddaden self-assigned this Jun 29, 2024
@ysbaddaden ysbaddaden requested a review from beta-ziliani June 29, 2024 21:58
@ysbaddaden
Copy link
Contributor Author

ysbaddaden commented Jun 30, 2024

Hypothesis A: the signal handler is running on the current stack, and that's messing with the interpreter. Maybe registering an alternate stack with sigaltstack() and pass SA_ONSTACK to sigaction.

Hypothesis B: the signal handler is replaced by interpreted code, which may do things that aren't thread safe, for example call into GC from the signal handler (not the signal loop). We might want the interpreted code to change the writer fd of the interpreter, so the signal handler runs in the interpreter but will be received by the interpreted code.

The interpreted code cannot register signal handlers because it will run
interpreted code that isn't safe. Even registering SIG_IGN must happen
in the interpreted, otherwise the handler tries to run some interpreted
and segfaults trying to allocate memory.

Introduces a couple interpreter intrinsics to:

2. register the signals to the OS, so the process can receive them;
1. replace the interpreter's signal writer to write to the interpreted
   code's signal loop;

C signal handlers will thus receive the signals and handle them in the
interpreter (minimal, non interpreted code, that writes to a file
descriptor) while the crystal signal handlers will be interpreted
normally.
We can reenable a few signal specs, but a couple ones to detect or
handle segfaults take forever to complete and have been disabled when
interpreted.
@ysbaddaden
Copy link
Contributor Author

Hypothesis B was the issue.

We needed a couple interpreter intrinsics after all. The interpreter is registering the handlers (can't run interpreted code in C signal handlers), so it receives the signals and forwards them to the interpreted signal pipe, and the signal loop of the interpreted code will invoke the interpreted handlers 👍

Note: the interpreter won't handle signals anymore, but it already didn't handle them after starting to interpret code, since the interpreter bypasses the fiber scheduler (it directly resumes fibers) so the signal loop fiber was never resumed.

@ysbaddaden
Copy link
Contributor Author

With this patch, I can run the full minitest.cr test suite using the interpreter ❤️

@ysbaddaden
Copy link
Contributor Author

ysbaddaden commented Jun 30, 2024

Running the std specs with the interpreter was painfully slow, so I compiled crystal in release mode and then tried to run the specs, but it segfaulted after a bunch of forks, and before reporting any progress. I tried running:

$ SPEC_SPLIT="3%4" bin/crystal i spec/std_spec.cr

A gdb session shows that the interpreter eventually segfaults:

Program received signal SIGSEGV, Segmentation fault.
0x0000555555e2181e in interpret ()
    at /home/julien/work/crystal-lang/crystal/src/compiler/crystal/interpreter/interpreter.cr:354
354           {% begin %}
(gdb) bt
#0  0x0000555555e2181e in interpret ()
    at /home/julien/work/crystal-lang/crystal/src/compiler/crystal/interpreter/interpreter.cr:354
#1  0x0000555555c4a753 in interpret ()
    at /home/julien/work/crystal-lang/crystal/src/compiler/crystal/interpreter/interpreter.cr:252
#2  0x00005555561ca274 in interpret ()
    at /home/julien/work/crystal-lang/crystal/src/compiler/crystal/interpreter/repl.cr:92
#3  interpret_and_exit_on_error ()
    at /home/julien/work/crystal-lang/crystal/src/compiler/crystal/interpreter/repl.cr:96
#4  0x00005555561c6587 in run_file ()
    at /home/julien/work/crystal-lang/crystal/src/compiler/crystal/interpreter/repl.cr:65
#5  repl () at /home/julien/work/crystal-lang/crystal/src/compiler/crystal/command/repl.cr:47
#6  0x0000555556126ea4 in run () at /home/julien/work/crystal-lang/crystal/src/compiler/crystal/command.cr:104
#7  0x000055555574671c in run () at /home/julien/work/crystal-lang/crystal/src/compiler/crystal/command.cr:55
#8  run () at /home/julien/work/crystal-lang/crystal/src/compiler/crystal/command.cr:54
#9  __crystal_main () at /home/julien/work/crystal-lang/crystal/src/compiler/crystal.cr:11
#10 0x000055555574cf59 in main_user_code () at /home/julien/work/crystal-lang/crystal/src/crystal/main.cr:118
#11 main () at /home/julien/work/crystal-lang/crystal/src/crystal/main.cr:104
#12 main () at /home/julien/work/crystal-lang/crystal/src/crystal/main.cr:130

@ysbaddaden
Copy link
Contributor Author

Interestingly the specs that failed when calling out to crystal i can still fail when calling out crystal build on CI, meaning the interpretation ain't the issue, but something else (subprocess std in/out/err communication?) 🤔

@ysbaddaden
Copy link
Contributor Author

I think the problem, now, is support for fork.

It's likely unsafe to fork/exec in the interpreted code, and we should fork and exec from the interpreter directly... or maybe we should use posix_spawn instead for interpreted code on UNIX? Or we should have an intrinsic for spawning a process in the interpreter (i.e. call fork/exec out of the interpreted code) 🤔

@ysbaddaden ysbaddaden marked this pull request as ready for review November 4, 2024 11:27
@ysbaddaden
Copy link
Contributor Author

Merged with master and I disabled the specs that rely on Process.run because we'll likely need a fork interpreter primitive for that.

@straight-shoota
Copy link
Member

I expect this might conflict with #15141 because signal handling is not implemented in the interpreter on Windows.

@ysbaddaden
Copy link
Contributor Author

@straight-shoota I don't think it will have any impact: we don't use the signal loop on win32, nor do we even run the signal specs on win32, and spawning sub-processes works differently —I suspect the interpret already supports Process.run on Windows.

@straight-shoota straight-shoota added this to the 1.15.0 milestone Nov 4, 2024
@HertzDevil
Copy link
Contributor

#15141 is for compiler specs pertaining to the interpreter, stdlib specs for Process under the interpreter are already enabled in #14964 and indeed work.

@straight-shoota straight-shoota merged commit fd1d0e3 into crystal-lang:master Nov 7, 2024
69 checks passed
@ysbaddaden ysbaddaden deleted the fix/interpreter-signal-handlers branch November 7, 2024 14:38
CTC97 pushed a commit to CTC97/crystal that referenced this pull request Nov 9, 2024
Adds a couple interpreter primitives to handle signals properly between the interpreter (receives signals and forwards them to the interpreted code signal pipe) then the interpreted fiber will receive the signal (through the pipe) and handle it.

The interpreted code is thus in control of what signals it wants the process to receive, while avoiding the issue where a signal handler would be interpreted code (i.e. recipe for disaster).

Fixes the default handler for SIGCHLD into the interpreted code, so `Process.run` ~~now works as expected and~~ don't hang forever anymore. The interpreter will still randomly segfault because `fork` should probably be an interpreter primitive (see the comments below).
@straight-shoota
Copy link
Member

This is not forward-compatible. Stdlib with this patch does not work with a 1.14 compiler.

straight-shoota added a commit that referenced this pull request Nov 13, 2024
)

This is a follow-up to #14766. The new primitives are only available in master
This patch ensures forward compatibility for the latest stable compiler (Crystal 1.14) and below.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:compiler:interpreter
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants