[Bug] yarn v2 does not forward SIGTERM to child processes #991

mprast · 2020-02-25T01:57:39Z

I'd be willing to implement a fix

Describe the bug
This appears to be a regression from yarnpkg/yarn#3424.
Doing a kill on the yarn run process will leave the other processes hanging around. To me it seems that the correct behavior should be to forward the SIGTERM to any child processes that yarn has created.

Would be great to have this fixed since it is a prereq for me using yarn run in production, which I believe I need to do to use PnP in prod (unless there's another way!). Kubernetes uses SIGTERM to signal containers to gracefully shut down and right now yarn doesn't fit into that workflow because it can't handle the signal.

To Reproduce
Couldn't figure out how to repro using Sherlock. As mentioned earlier, the reference link in the sherlock documentation is broken.

Here's a minimal repro. To see the behavior, clone the repo, then:

yarn run http-server in one window
ps ax | grep http-server in another. You should see 3 processes, with node /usr/bin/yarn run http-server as the top process.
kill the PID of the top process.
ps ax | grep http-server again. You'll see that the top process is gone, but the other two remain.

Environment if relevant (please complete the following information):

OS: [e.g. OSX, Linux, Windows, ...]
Node version [e.g. 8.15.0, 10.15.1, ...]
Yarn version [e.g. 2.0.0-rc1, ...]

Thank you!

The text was updated successfully, but these errors were encountered:

adamdunkley · 2020-04-09T15:07:02Z

Also, please take a look at the issue I raised on yarn v1 when considering a strategy for this: yarnpkg/yarn#4667

The strategy taken when fixing it on v1 ignored Unix standards, by always killing children immediately and ignoring the child's exit status, which caused numerous problems (which you can see from the comments on that issue)

arcanis · 2020-05-13T10:12:24Z

I just found this issue again - one thing I'd like to be sure I understand: when pressing ^C, SIGINT is sent to all the process from the group. We're actually just ignoring this signal while a child process is running, since the SIGINT will cause the child to abort, and us as well once that happens.

Why isn't it the same for SIGTERM? Why do we need to forward it to the children? Shouldn't K8s send the SIGTERM to all the children from the group? Do you have documentation on this behaviour?

bgotink · 2020-05-13T12:25:44Z

This isn't related to k8s, it can be reproduced using a simple kill. This actually explains some weird behaviour I've seen in yarn 1, which I thought at the time to have been a bug in the angular CLI.

The cause appears to be some unexpected signal handling in node. Sending a signal using e.g. kill will send it to the outer process, and node does not forward it to any spawned children. Only for ctrl-c will node forward the SIGINT signal.
I thought maybe it's because that signal comes in via stdin, which gets passed into the child. The ctrl-c still gets forwarded when passing stdio: ['ignore', 1, 2] in the execFileSync options, so this doesn't explain it.

You can verify all of this using:

// test.js
const {execFileSync} = require('child_process');

console.log('outer: pid', process.pid);

execFileSync(process.execPath, [__dirname + '/test-inner.js'], {
  stdio: ['ignore', 1, 2],
});

and

// test-inner.js
console.log('inner: pid', process.pid);

process.on('SIGINT', () => {
  console.log('inner: sigint');
  process.exit(0);
});

process.on('SIGTERM', () => {
  console.log('inner: sigterm');
  process.exit(0);
});

// equivalent to "while (true) {}" but async to allow event loop to continue
(function loop() {
  setTimeout(loop, 0);
})();

the output is:

# ctrl-c
bram:/tmp $ node test
outer: pid 48479
inner: pid 48480
^Cinner: sigint

# sigint
bram:/tmp $ node test
outer: pid 48493
inner: pid 48494
# in other terminal: kill -SIGINT 48493
bram:/tmp $ kill 48494
inner: sigterm

# sigterm
bram:/tmp $ node test
outer: pid 50606
inner: pid 50607
# in other terminal: kill 50606
fish: 'node test' terminated by signal SIGTERM (Polite quit request)
bram:/tmp $ kill 50607
inner: sigterm

arcanis · 2020-05-13T12:58:52Z

Looking at this post, I think I can make some sense of it:

By POSIX rule, ^C causes the shell to send a SIGINT to the foreground process group.
However, kill, by default, will only target one process, not a group. To target the whole group, it'd need to use the negative pid (ie kill -50606).

That would explain why the SIGINT goes down until it reaches the children, whereas the SIGTERM
doesn't. In fact if that's true, sending a manual one-process SIGINT (kill 50606 -SIGINT) will have the same effect (or rather, it won't stop anything, since we ignore it and the children won't receive it).

With that in mind, remains to figure out what's the correct behaviour ... shouldn't tools such as K8s send the SIGTERM signal to the process group, hence following the same behaviour as what ^C does? On the other hand, it may make sense for the parent to explicitly decide whether the signals should be forwarded or not.

adamdunkley · 2020-05-13T13:11:25Z

@bgotink I think this is strange but not entirely unexpected behaviour. child_process is basically using popen underneath or at least purports to roughly conform to it. It's POSIX standard that a keyboard ^c is an INTR, not directly a SIGINT (though it generates SIGINT(s)). One of differences is that with a INTR it is forwarded to children automatically. However, a naked SIGINT is not automatically forwarded. You can see someone who is confused about this discussing it here https://unix.stackexchange.com/questions/149741/why-is-sigint-not-propagated-to-child-process-when-sent-to-its-parent-process

Here is where the POSIX standard describes a INTR https://pubs.opengroup.org/onlinepubs/000095399/basedefs/xbd_chap11.html#tag_11_01_09

So the nub of it is that it is up to parent processes to forward signals in any POSIX system. With Yarn, as it is executing one process on behalf of the user (sort of as a proxy, where it sets up a certain environment for the child process), I would expect it to forward those signals (as it's not a daemon manager or some such). I would also expect (as there is a 1:1 relationship with the parent and child process) that it would wait a reasonable amount of time for the child to exit and pass any exit code from the child as its own exit code.

arcanis · 2020-05-13T13:16:48Z

I would expect it to forward those signals (as it's not a daemon manager or some such)

Wouldn't that cause double signals to reach the underlying tool? For example, in the case of docker-compose, one SIGINT triggers a graceful shutdown, but one more aborts the images forcefully. In this kind of situation we don't want to forward the signals.

bgotink · 2020-05-13T13:34:31Z

Can we detect if a SIGINT originates from a ctrl-c?

If not, would it make sense to forward all signals except SIGINT and SIGQUIT? SIGINT and SIGQUIT are meant as signals sent by terminals (source), i.e. by humans. These user interactions send both of these signals to the entire process group instead of only the parent process (source).

adamdunkley · 2020-05-13T13:51:00Z

@bgotink My understanding is that INTR is unwrapped in to SIGINT outside the process so there is no way to know where it originated from (without some awful voodoo magic, which I wouldn't recommend)

@arcanis I think this is a tricky one because I don't think there is a great deal in the spec about SIGINT and what really to do with it (it's left up to the application). Really, in the spirit of SIGINT, docker-compose should not be interpreting two SIGINTs as a SIGKILL (because that's what SIGKILL is for!). My vote would still be that forwarding all signals makes sense because worse than an application doing something weird with two SIGINTs is having child processes mounting up. At least in the former the damage will be limited to those applications with the esoteric signal handling behaviour, in the latter it is a universal problem that all child processes who receive a non ^c SIGINT will continue to run without their parent (which in a run command does not seem desirable).

adamdunkley · 2020-05-13T13:54:14Z

Btw, docker-compose itself forwards signals and waits 10 seconds for children to exit before sending a SIGKILL … https://docs.docker.com/compose/faq/#why-do-my-services-take-10-seconds-to-recreate-or-stop. I get that the context is a little different when you have something running in viz, but still.

bgotink · 2020-05-13T14:18:15Z

My understanding is that INTR is unwrapped in to SIGINT outside the process so there is no way to know where it originated from (without some awful voodoo magic, which I wouldn't recommend)

I was expecting this answer as I couldn't find anything on it.

Interpreting two consecutive SIGINTs as a SIGTERM is something we see more often than just docker-compose, e.g. in the node repl.
If we forward SIGINT, running yarn node would quit on the first ctrl-c instead of requiring another to confirm.

adamdunkley · 2020-05-13T14:47:35Z

@bgotink Weirdly node REPL is able to understand the difference between the two things … I assume because they are capturing keys (see: voodoo magic). I am not sure that it would react any differently, as a result, because it is not capturing ^c but all keyboard input. I haven't dug in to exactly how it does it. Try killing it with a SIGINT remotely and you'll see that it dies without prompt, likewise run it inside an npm run (which I know forwards signals) and you'll see it still behaves as expected. I also set up a test script for how npm does this (which, I know, yarn is not npm, but it's a good reference) and it has the behaviour I describe (two SIGINTs are sent on ^c):

// test.js
console.log('pid', process.pid);

process.on('SIGINT', () => {
  console.log('sigint');
  setTimeout(() => {
    console.log('exit');
    process.exit(0);
  }, 10)
});

(function loop() {
  setTimeout(loop, 0);
})();

The output:

$ npm run hi

> node test.js

pid 90933
^Csigint
sigint
exit

bgotink · 2020-05-13T15:10:29Z

Ah I assume the node repl puts the stdin in raw mode, which disables the entire INTR behaviour.

When in raw mode, input is always available character-by-character, not including modifiers. Additionally, all special processing of characters by the terminal is disabled, including echoing input characters. CTRL+C will no longer cause a SIGINT when in this mode.
~ https://nodejs.org/api/tty.html#tty_readstream_setrawmode_mode

Still, given the default behaviour of the signals caused by user interaction (SIGINT with INTR, SIGQUIT with its analogous QUIT, SIGTSTP and its analogous SUSP) I would propose not forwarding those signals.
I fear having ctrl-c behave differently by triggering two consecutive SIGINTs is going to bite us harder than not supporting kill -SIGINT <pid of yarn>. For SIGINT it's also dangerous to assume the user wants to quit the process, e.g. if I hit ^C in BASH I want to quit the active child but not BASH itself, so even implementing a timeout + kill signal if not finished in time would potentially cause issues.

adamdunkley · 2020-05-13T20:30:09Z

Actually, on further reviewing the behaviour of yarn v2's run, the "problems" are slimmer than I thought. The behaviour of ^c should be sufficient for most people using the terminal (behaviour should be as expected, unless the child process is unruly … but actually yarn doesn't quit until it does so at least it's still foregrounded). For parents of yarn they should work similarly, just if they do things by sending a SIGINT themselves (just to the yarn process) they should understand the consequences (as @arcanis pointed out a while back).

@mprast I get the issue with k8s and docker in general, which is that they advocate there being just 1 process that controls all its children. For this particular issue I advocate using an init wrapper script like https://github.com/Yelp/dumb-init (which can be the ENTRYPOINT for your containerised app). This will fork off the yarn process and forward any signals to the process group (of which any children should be a member, if I understand execFileSync in Node correctly).

If people are still really struggling I could see one possibility being a simple flag to yarn run that would get it to forwards signals, but I would think there are better solutions.

mprast · 2020-05-13T20:37:33Z

yeah FWIW I ended up using a k8s preStop hook - seems like they had a workaround kind of built-in already. dumb-init looks like a good solution too. I'll go ahead and close this; thx for the help all!

snwfdhmp · 2020-10-28T21:17:45Z

This is important

merceyz · 2020-10-28T21:23:22Z

See #1741

mprast added the bug Something isn't working label Feb 25, 2020

adamdunkley mentioned this issue Apr 9, 2020

Exits with code 1 on SIGTERM without waiting for child yarnpkg/yarn#4667

Open

mprast closed this as completed May 13, 2020

DRoet mentioned this issue Aug 23, 2020

[Bug] Unable to catch signals using Docker #1741

Closed

1 task

arcanis mentioned this issue Jan 12, 2021

Handle propogating SIGTERM to spawned child processes #2347

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] yarn v2 does not forward SIGTERM to child processes #991

[Bug] yarn v2 does not forward SIGTERM to child processes #991

mprast commented Feb 25, 2020 •

edited

Loading

adamdunkley commented Apr 9, 2020

arcanis commented May 13, 2020

bgotink commented May 13, 2020

arcanis commented May 13, 2020 •

edited

Loading

adamdunkley commented May 13, 2020

arcanis commented May 13, 2020

bgotink commented May 13, 2020

adamdunkley commented May 13, 2020

adamdunkley commented May 13, 2020

bgotink commented May 13, 2020

adamdunkley commented May 13, 2020 •

edited

Loading

bgotink commented May 13, 2020

adamdunkley commented May 13, 2020 •

edited

Loading

mprast commented May 13, 2020

snwfdhmp commented Oct 28, 2020

merceyz commented Oct 28, 2020

[Bug] yarn v2 does not forward SIGTERM to child processes #991

[Bug] yarn v2 does not forward SIGTERM to child processes #991

Comments

mprast commented Feb 25, 2020 • edited Loading

adamdunkley commented Apr 9, 2020

arcanis commented May 13, 2020

bgotink commented May 13, 2020

arcanis commented May 13, 2020 • edited Loading

adamdunkley commented May 13, 2020

arcanis commented May 13, 2020

bgotink commented May 13, 2020

adamdunkley commented May 13, 2020

adamdunkley commented May 13, 2020

bgotink commented May 13, 2020

adamdunkley commented May 13, 2020 • edited Loading

bgotink commented May 13, 2020

adamdunkley commented May 13, 2020 • edited Loading

mprast commented May 13, 2020

snwfdhmp commented Oct 28, 2020

merceyz commented Oct 28, 2020

mprast commented Feb 25, 2020 •

edited

Loading

arcanis commented May 13, 2020 •

edited

Loading

adamdunkley commented May 13, 2020 •

edited

Loading

adamdunkley commented May 13, 2020 •

edited

Loading