Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of CTRL_CLOSE_EVENT/SIGHUP on Windows causes races and deadlocks #10165

Closed
mika-fischer opened this issue Dec 7, 2016 · 9 comments
Closed
Labels
libuv Issues and PRs related to the libuv dependency or the uv binding. process Issues and PRs related to the process subsystem. windows Issues and PRs related to the Windows platform.

Comments

@mika-fischer
Copy link
Contributor

  • Version: v7.2.1
  • Platform: Windows 8/10 64-bit
  • Subsystem: process

On Windows, when a console window is closed, a CTRL_CLOSE_EVENT is sent to all processes running in that console. Applications can register handlers for this event, and all registered handlers for all processes in the console are run serially. After the handlers for a process finish, the process is automatically terminated by windows. Further, there's a timeout of five seconds. If the handler is not finished after five seconds, Windows also terminates the process. It's also worth mentioning that Windows creates a new thread in the target process and runs the handler functions in this new thread.

For further details about this see the documentation for SetConsoleCtrlHandler and HandlerRoutine.

libuv maps this event to SIGHUP, so when registering an event listener for SIGHUP on Windows, libuv sets up a handler for CTRL_CLOSE_EVENT. This handler emits a SIGHUP event, so that the registered listener is called. And after that, it calls Sleep(INFINITE), which just causes the current thread to block forever.

While this approach is understandable - as not sleeping would just kill the process - the current implementation causes several serious issues:

I have prepared a repository with scripts demonstrating the issues here: https://github.com/mika-fischer/nodejs-sighup-windows

First of all, once a SIGHUP handler is registered and a CTRL_CLOSE_EVENT received, it's no longer possible to terminate node properly. Probably the running thread somehow keeps the process alive. See on-SIGHUP-keeps-node-alive.js.

This is not completely catastrophic due to the fact that Windows will kill the process after five seconds. However this means that the SIGHUP handlers for all other processes in the console get delayed by 5 seconds, since they are called serially. Using nodejs cluster mode with 12 workers, it would already take a minute to shut down the system...

OK, so we can work around this by terminating ourselves after receiving SIGHUP, by calling process.kill(process.pid), which works for killing us. But as a side effect, it will cause the next CTRL_CLOSE_EVENT handler in another process to be called. And if this process is already shutting down, this can lead to further race conditions and deadlocks:

Once a SIGHUP handler is registered, it's no longer safe to exit the application normally. Even after the event loop is drained and node is shutting down, a CTRL_CLOSE_EVENT could arrive. Since the SIGHUP handler is still registered, a SIGHUP event will be emitted (but never received by anyone, since node has already shut down) and the Sleep(INFINITY) will kick in, causing the process to hang.
See forktest.js.

OK, so we just need to remove our SIGHUP handler, right? Unfortunately, this leads to the next problem:

Once a SIGHUP handler is registered, it's no longer safe to remove it. For some reason, after receiving a CTRL_CLOSE_EVENT, trying to unregister the handler completely blocks execution of nodejs. This is the most serious issue here. See removeAllListeners-SIGHUP-after-SIGHUP-received-blocks-event-loop.js.

Taken together, this makes it completely impossible to properly shut down nodejs on Windows in response to SIGHUP/CTRL_CLOSE_EVENT. The only option is to always forcefully kill the node process once a handler for SIGHUP is set up and never trying to remove the handler.

I think a nicer behaviour would be:

  • Always allow removal of the SIGHUP event listener, even when a CTRL_CLOSE_EVENT was received.
  • If node terminates normally and the Sleep(INFINITY) is all that's keeping it running, kill the thread and terminate properly.
@italoacasas italoacasas added the process Issues and PRs related to the process subsystem. label Dec 7, 2016
@mscdex mscdex added the windows Issues and PRs related to the Windows platform. label Dec 7, 2016
@sam-github
Copy link
Contributor

/to @nodejs/platform-windows @saghul
/cc @piscisaureus

@sam-github sam-github added the libuv Issues and PRs related to the libuv dependency or the uv binding. label Dec 7, 2016
@piscisaureus
Copy link
Contributor

OK, so we can work around this by terminating ourselves after receiving SIGHUP, by calling process.kill(process.pid), which works for killing us. But as a side effect, it will cause the next CTRL_CLOSE_EVENT handler in another process to be called.

Does process.exit() not work to terminate the process?

For some reason, after receiving a CTRL_CLOSE_EVENT, trying to unregister the handler completely blocks execution of nodejs.

This is probably because SetConsoleCtrlHandler() (called here) blocks if there is a handler currently running. It could probably be worked around by just not calling that function after we've received a CTRL_CLOSE_EVENT.

If node terminates normally and the Sleep(INFINITY) is all that's keeping it running, kill the thread and terminate properly.

I think the easiest fix is, in the CTRL_CLOSE_EVENT handler, to wait for the main thread, rather than to sleep indefinitely. So it becomes something like:

HANDLE main_loop_thread_handle = ... ; // hot sure how to get it, but it's possible.
WaitForSingleObject(main_thread_handle, INFINITE);

@sam-github
Copy link
Contributor

@digitalinfinity Not sure if you saw this, but some Windows talent could go a long way here!

@mika-fischer
Copy link
Contributor Author

mika-fischer commented Dec 8, 2016

@piscisaureus No, process.exit() does not help at all. It goes through all the exit logic and while node is shutting down, a CTRL_CLOSE_EVENT can arrive, which causes the hang...

I did some debugging using the pdb file for node and can now provide stack traces for the two cases of the hang.

For the hang that happens after receiving CTRL_CLOSE_EVENT when node shuts down normally (i.e. on-SIGHUP-keeps-node-alive.js), it seems to hang in node, not libuv, specifically in SigintWatchdogHelper::Stop(), which indeed does call SetConsoleCtrlHandler to unregister node's own handler function. It's probable that this call does indeed block while a handler is running...

hang-after-ctrl_close_event

The hang caused by unregistering the SIGHUP listener after receiving a CTRL_CLOSE_EVENT (i.e. removeAllListeners-SIGHUP-after-SIGHUP-received-blocks-event-loop.js), hangs in libuv, specifically uv__signal_unregister, which also calls SetConsoleCtrlHandler to unregister libuv's handler function.

remove-listener

I also played around with SetConsoleCtrlHandler in C++ and it seems unlikely that the Sleep(INFINITY) does keep the process from closing, so I think that can be left as is. Just the calls to SetConsoleCtrlHandler to unregister the handler need to be either conditional on whether CTRL_CLOSE_EVENT was received, which might be difficult (since both node and libuv have their own handlers installed). Or maybe they could be omitted altogether in all cases, by having the node and libuv handler functions always return false, which is basically the same as not having the handler at all.

@bzoz
Copy link
Contributor

bzoz commented Dec 8, 2016

If calling SetConsoleCtrlHandler to unregister is the problem, we could keep it registered and use a flag to mark it as "unregistered".

In any case, using process.once('SIGHUP', ...) hangs node, and is used in lib/_debugger.js

@mika-fischer
Copy link
Contributor Author

@bzoz Yes, I think this would fix both issues.

@bzoz
Copy link
Contributor

bzoz commented Dec 9, 2016

I've added flag to libuv and node_watchdog to mark handlers as "removed" instead of removing them with SetConsoleCtrlHadler (see: master...janeasystems:bartek-node-fix-10165). It does seem to fix both issues - on-SIGHUP-keeps-node-alive.js exits and removeAllListeners-SIGHUP-after-SIGHUP-received-blocks-event-loop.js also works. If you want go ahead and test.

I still want to test this change, if everything is OK I'll probably open PRs on Monday.

@mika-fischer
Copy link
Contributor Author

@bzoz Great, thanks a lot! It looks good to me, but I'll try to compile node next week and test it as well.

@mika-fischer
Copy link
Contributor Author

I've tested it and the fix works for me, as well.

bzoz added a commit to JaneaSystems/libuv that referenced this issue Jan 4, 2017
Adds flags that marks uv__signal_control_handler as disabled instead
of removing it when uv__signal_control_handler_refs falls to zero.
Trying to remove the controller from the controller handle itself
leads to deadlock.

Ref: nodejs/node#10165
saghul pushed a commit to libuv/libuv that referenced this issue Jan 18, 2017
Trying to remove the controller from the controller handle itself
leads to deadlock. Fix it by always having the global signal handler
engaged.

Ref: nodejs/node#10165
PR-URL: #1168
Reviewed-By: Saúl Ibarra Corretgé <[email protected]>
@cjihrig cjihrig closed this as completed in 8514269 Feb 9, 2017
italoacasas pushed a commit that referenced this issue Feb 13, 2017
Fixes: #10165
Fixes: #9856
Fixes: #10607
Fixes: #11104
PR-URL: #11094
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Sakthipriyan Vairamani <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
Reviewed-By: Michael Dawson <[email protected]>
italoacasas pushed a commit to italoacasas/node that referenced this issue Feb 14, 2017
Fixes: nodejs#10165
Fixes: nodejs#9856
Fixes: nodejs#10607
Fixes: nodejs#11104
PR-URL: nodejs#11094
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Sakthipriyan Vairamani <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
Reviewed-By: Michael Dawson <[email protected]>
krydos pushed a commit to krydos/node that referenced this issue Feb 25, 2017
Fixes: nodejs#10165
Fixes: nodejs#9856
Fixes: nodejs#10607
Fixes: nodejs#11104
PR-URL: nodejs#11094
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Sakthipriyan Vairamani <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
Reviewed-By: Michael Dawson <[email protected]>
MylesBorins pushed a commit that referenced this issue May 18, 2017
Fixes: #10165
Fixes: #9856
Fixes: #10607
Fixes: #11104
PR-URL: #11094
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Sakthipriyan Vairamani <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
Reviewed-By: Michael Dawson <[email protected]>
MylesBorins pushed a commit that referenced this issue May 18, 2017
Fixes: #10165
Fixes: #9856
Fixes: #10607
Fixes: #11104
PR-URL: #11094
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Sakthipriyan Vairamani <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
Reviewed-By: Michael Dawson <[email protected]>
MylesBorins pushed a commit that referenced this issue May 18, 2017
Fixes: #10165
Fixes: #9856
Fixes: #10607
Fixes: #11104
PR-URL: #11094
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Sakthipriyan Vairamani <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
Reviewed-By: Michael Dawson <[email protected]>
andrew749 pushed a commit to michielbaird/node that referenced this issue Jul 19, 2017
Fixes: nodejs/node#10165
Fixes: nodejs/node#9856
Fixes: nodejs/node#10607
Fixes: nodejs/node#11104
PR-URL: nodejs/node#11094
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Sakthipriyan Vairamani <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
Reviewed-By: Michael Dawson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libuv Issues and PRs related to the libuv dependency or the uv binding. process Issues and PRs related to the process subsystem. windows Issues and PRs related to the Windows platform.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants