Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] yarn v2 does not forward SIGTERM to child processes #991

Closed
1 task
mprast opened this issue Feb 25, 2020 · 16 comments
Closed
1 task

[Bug] yarn v2 does not forward SIGTERM to child processes #991

mprast opened this issue Feb 25, 2020 · 16 comments
Labels
bug Something isn't working

Comments

@mprast
Copy link

mprast commented Feb 25, 2020

  • I'd be willing to implement a fix

Describe the bug
This appears to be a regression from yarnpkg/yarn#3424.
Doing a kill on the yarn run process will leave the other processes hanging around. To me it seems that the correct behavior should be to forward the SIGTERM to any child processes that yarn has created.

Would be great to have this fixed since it is a prereq for me using yarn run in production, which I believe I need to do to use PnP in prod (unless there's another way!). Kubernetes uses SIGTERM to signal containers to gracefully shut down and right now yarn doesn't fit into that workflow because it can't handle the signal.

To Reproduce
Couldn't figure out how to repro using Sherlock. As mentioned earlier, the reference link in the sherlock documentation is broken.

Here's a minimal repro. To see the behavior, clone the repo, then:

  1. yarn run http-server in one window
  2. ps ax | grep http-server in another. You should see 3 processes, with node /usr/bin/yarn run http-server as the top process.
  3. kill the PID of the top process.
  4. ps ax | grep http-server again. You'll see that the top process is gone, but the other two remain.

Environment if relevant (please complete the following information):

  • OS: [e.g. OSX, Linux, Windows, ...]
  • Node version [e.g. 8.15.0, 10.15.1, ...]
  • Yarn version [e.g. 2.0.0-rc1, ...]

Thank you!

@mprast mprast added the bug Something isn't working label Feb 25, 2020
@adamdunkley
Copy link

Also, please take a look at the issue I raised on yarn v1 when considering a strategy for this: yarnpkg/yarn#4667

The strategy taken when fixing it on v1 ignored Unix standards, by always killing children immediately and ignoring the child's exit status, which caused numerous problems (which you can see from the comments on that issue)

@arcanis
Copy link
Member

arcanis commented May 13, 2020

I just found this issue again - one thing I'd like to be sure I understand: when pressing ^C, SIGINT is sent to all the process from the group. We're actually just ignoring this signal while a child process is running, since the SIGINT will cause the child to abort, and us as well once that happens.

Why isn't it the same for SIGTERM? Why do we need to forward it to the children? Shouldn't K8s send the SIGTERM to all the children from the group? Do you have documentation on this behaviour?

@bgotink
Copy link
Member

bgotink commented May 13, 2020

This isn't related to k8s, it can be reproduced using a simple kill. This actually explains some weird behaviour I've seen in yarn 1, which I thought at the time to have been a bug in the angular CLI.

The cause appears to be some unexpected signal handling in node. Sending a signal using e.g. kill will send it to the outer process, and node does not forward it to any spawned children. Only for ctrl-c will node forward the SIGINT signal.
I thought maybe it's because that signal comes in via stdin, which gets passed into the child. The ctrl-c still gets forwarded when passing stdio: ['ignore', 1, 2] in the execFileSync options, so this doesn't explain it.

You can verify all of this using:

// test.js
const {execFileSync} = require('child_process');

console.log('outer: pid', process.pid);

execFileSync(process.execPath, [__dirname + '/test-inner.js'], {
  stdio: ['ignore', 1, 2],
});

and

// test-inner.js
console.log('inner: pid', process.pid);

process.on('SIGINT', () => {
  console.log('inner: sigint');
  process.exit(0);
});

process.on('SIGTERM', () => {
  console.log('inner: sigterm');
  process.exit(0);
});

// equivalent to "while (true) {}" but async to allow event loop to continue
(function loop() {
  setTimeout(loop, 0);
})();

the output is:

# ctrl-c
bram:/tmp $ node test
outer: pid 48479
inner: pid 48480
^Cinner: sigint

# sigint
bram:/tmp $ node test
outer: pid 48493
inner: pid 48494
# in other terminal: kill -SIGINT 48493
bram:/tmp $ kill 48494
inner: sigterm

# sigterm
bram:/tmp $ node test
outer: pid 50606
inner: pid 50607
# in other terminal: kill 50606
fish: 'node test' terminated by signal SIGTERM (Polite quit request)
bram:/tmp $ kill 50607
inner: sigterm

@arcanis
Copy link
Member

arcanis commented May 13, 2020

Looking at this post, I think I can make some sense of it:

  • By POSIX rule, ^C causes the shell to send a SIGINT to the foreground process group.
  • However, kill, by default, will only target one process, not a group. To target the whole group, it'd need to use the negative pid (ie kill -50606).

That would explain why the SIGINT goes down until it reaches the children, whereas the SIGTERM
doesn't. In fact if that's true, sending a manual one-process SIGINT (kill 50606 -SIGINT) will have the same effect (or rather, it won't stop anything, since we ignore it and the children won't receive it).

With that in mind, remains to figure out what's the correct behaviour ... shouldn't tools such as K8s send the SIGTERM signal to the process group, hence following the same behaviour as what ^C does? On the other hand, it may make sense for the parent to explicitly decide whether the signals should be forwarded or not.

@adamdunkley
Copy link

@bgotink I think this is strange but not entirely unexpected behaviour. child_process is basically using popen underneath or at least purports to roughly conform to it. It's POSIX standard that a keyboard ^c is an INTR, not directly a SIGINT (though it generates SIGINT(s)). One of differences is that with a INTR it is forwarded to children automatically. However, a naked SIGINT is not automatically forwarded. You can see someone who is confused about this discussing it here https://unix.stackexchange.com/questions/149741/why-is-sigint-not-propagated-to-child-process-when-sent-to-its-parent-process

Here is where the POSIX standard describes a INTR https://pubs.opengroup.org/onlinepubs/000095399/basedefs/xbd_chap11.html#tag_11_01_09

So the nub of it is that it is up to parent processes to forward signals in any POSIX system. With Yarn, as it is executing one process on behalf of the user (sort of as a proxy, where it sets up a certain environment for the child process), I would expect it to forward those signals (as it's not a daemon manager or some such). I would also expect (as there is a 1:1 relationship with the parent and child process) that it would wait a reasonable amount of time for the child to exit and pass any exit code from the child as its own exit code.

@arcanis
Copy link
Member

arcanis commented May 13, 2020

I would expect it to forward those signals (as it's not a daemon manager or some such)

Wouldn't that cause double signals to reach the underlying tool? For example, in the case of docker-compose, one SIGINT triggers a graceful shutdown, but one more aborts the images forcefully. In this kind of situation we don't want to forward the signals.

@bgotink
Copy link
Member

bgotink commented May 13, 2020

Can we detect if a SIGINT originates from a ctrl-c?

If not, would it make sense to forward all signals except SIGINT and SIGQUIT? SIGINT and SIGQUIT are meant as signals sent by terminals (source), i.e. by humans. These user interactions send both of these signals to the entire process group instead of only the parent process (source).

@adamdunkley
Copy link

@bgotink My understanding is that INTR is unwrapped in to SIGINT outside the process so there is no way to know where it originated from (without some awful voodoo magic, which I wouldn't recommend)

@arcanis I think this is a tricky one because I don't think there is a great deal in the spec about SIGINT and what really to do with it (it's left up to the application). Really, in the spirit of SIGINT, docker-compose should not be interpreting two SIGINTs as a SIGKILL (because that's what SIGKILL is for!). My vote would still be that forwarding all signals makes sense because worse than an application doing something weird with two SIGINTs is having child processes mounting up. At least in the former the damage will be limited to those applications with the esoteric signal handling behaviour, in the latter it is a universal problem that all child processes who receive a non ^c SIGINT will continue to run without their parent (which in a run command does not seem desirable).

@adamdunkley
Copy link

Btw, docker-compose itself forwards signals and waits 10 seconds for children to exit before sending a SIGKILL … https://docs.docker.com/compose/faq/#why-do-my-services-take-10-seconds-to-recreate-or-stop. I get that the context is a little different when you have something running in viz, but still.

@bgotink
Copy link
Member

bgotink commented May 13, 2020

My understanding is that INTR is unwrapped in to SIGINT outside the process so there is no way to know where it originated from (without some awful voodoo magic, which I wouldn't recommend)

I was expecting this answer as I couldn't find anything on it.

Interpreting two consecutive SIGINTs as a SIGTERM is something we see more often than just docker-compose, e.g. in the node repl.
If we forward SIGINT, running yarn node would quit on the first ctrl-c instead of requiring another to confirm.

@adamdunkley
Copy link

adamdunkley commented May 13, 2020

@bgotink Weirdly node REPL is able to understand the difference between the two things … I assume because they are capturing keys (see: voodoo magic). I am not sure that it would react any differently, as a result, because it is not capturing ^c but all keyboard input. I haven't dug in to exactly how it does it. Try killing it with a SIGINT remotely and you'll see that it dies without prompt, likewise run it inside an npm run (which I know forwards signals) and you'll see it still behaves as expected. I also set up a test script for how npm does this (which, I know, yarn is not npm, but it's a good reference) and it has the behaviour I describe (two SIGINTs are sent on ^c):

// test.js
console.log('pid', process.pid);

process.on('SIGINT', () => {
  console.log('sigint');
  setTimeout(() => {
    console.log('exit');
    process.exit(0);
  }, 10)
});

(function loop() {
  setTimeout(loop, 0);
})();

The output:

$ npm run hi

> node test.js

pid 90933
^Csigint
sigint
exit

@bgotink
Copy link
Member

bgotink commented May 13, 2020

Ah I assume the node repl puts the stdin in raw mode, which disables the entire INTR behaviour.

When in raw mode, input is always available character-by-character, not including modifiers. Additionally, all special processing of characters by the terminal is disabled, including echoing input characters. CTRL+C will no longer cause a SIGINT when in this mode.
~ https://nodejs.org/api/tty.html#tty_readstream_setrawmode_mode

Still, given the default behaviour of the signals caused by user interaction (SIGINT with INTR, SIGQUIT with its analogous QUIT, SIGTSTP and its analogous SUSP) I would propose not forwarding those signals.
I fear having ctrl-c behave differently by triggering two consecutive SIGINTs is going to bite us harder than not supporting kill -SIGINT <pid of yarn>. For SIGINT it's also dangerous to assume the user wants to quit the process, e.g. if I hit ^C in BASH I want to quit the active child but not BASH itself, so even implementing a timeout + kill signal if not finished in time would potentially cause issues.

@adamdunkley
Copy link

adamdunkley commented May 13, 2020

Actually, on further reviewing the behaviour of yarn v2's run, the "problems" are slimmer than I thought. The behaviour of ^c should be sufficient for most people using the terminal (behaviour should be as expected, unless the child process is unruly … but actually yarn doesn't quit until it does so at least it's still foregrounded). For parents of yarn they should work similarly, just if they do things by sending a SIGINT themselves (just to the yarn process) they should understand the consequences (as @arcanis pointed out a while back).

@mprast I get the issue with k8s and docker in general, which is that they advocate there being just 1 process that controls all its children. For this particular issue I advocate using an init wrapper script like https://github.com/Yelp/dumb-init (which can be the ENTRYPOINT for your containerised app). This will fork off the yarn process and forward any signals to the process group (of which any children should be a member, if I understand execFileSync in Node correctly).

If people are still really struggling I could see one possibility being a simple flag to yarn run that would get it to forwards signals, but I would think there are better solutions.

@mprast
Copy link
Author

mprast commented May 13, 2020

yeah FWIW I ended up using a k8s preStop hook - seems like they had a workaround kind of built-in already. dumb-init looks like a good solution too. I'll go ahead and close this; thx for the help all!

@snwfdhmp
Copy link

This is important

@merceyz
Copy link
Member

merceyz commented Oct 28, 2020

See #1741

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants