-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling of CTRL_CLOSE_EVENT/SIGHUP on Windows causes races and deadlocks #10165
Comments
/to @nodejs/platform-windows @saghul |
Does
This is probably because
I think the easiest fix is, in the CTRL_CLOSE_EVENT handler, to wait for the main thread, rather than to sleep indefinitely. So it becomes something like: HANDLE main_loop_thread_handle = ... ; // hot sure how to get it, but it's possible.
WaitForSingleObject(main_thread_handle, INFINITE); |
@digitalinfinity Not sure if you saw this, but some Windows talent could go a long way here! |
@piscisaureus No, I did some debugging using the pdb file for node and can now provide stack traces for the two cases of the hang. For the hang that happens after receiving CTRL_CLOSE_EVENT when node shuts down normally (i.e. on-SIGHUP-keeps-node-alive.js), it seems to hang in node, not libuv, specifically in The hang caused by unregistering the SIGHUP listener after receiving a CTRL_CLOSE_EVENT (i.e. removeAllListeners-SIGHUP-after-SIGHUP-received-blocks-event-loop.js), hangs in libuv, specifically uv__signal_unregister, which also calls I also played around with |
If calling In any case, using |
@bzoz Yes, I think this would fix both issues. |
I've added flag to I still want to test this change, if everything is OK I'll probably open PRs on Monday. |
@bzoz Great, thanks a lot! It looks good to me, but I'll try to compile node next week and test it as well. |
I've tested it and the fix works for me, as well. |
Adds flags that marks uv__signal_control_handler as disabled instead of removing it when uv__signal_control_handler_refs falls to zero. Trying to remove the controller from the controller handle itself leads to deadlock. Ref: nodejs/node#10165
Trying to remove the controller from the controller handle itself leads to deadlock. Fix it by always having the global signal handler engaged. Ref: nodejs/node#10165 PR-URL: #1168 Reviewed-By: Saúl Ibarra Corretgé <[email protected]>
Fixes: #10165 Fixes: #9856 Fixes: #10607 Fixes: #11104 PR-URL: #11094 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Sakthipriyan Vairamani <[email protected]> Reviewed-By: Santiago Gimeno <[email protected]> Reviewed-By: Michael Dawson <[email protected]>
Fixes: nodejs#10165 Fixes: nodejs#9856 Fixes: nodejs#10607 Fixes: nodejs#11104 PR-URL: nodejs#11094 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Sakthipriyan Vairamani <[email protected]> Reviewed-By: Santiago Gimeno <[email protected]> Reviewed-By: Michael Dawson <[email protected]>
Fixes: nodejs#10165 Fixes: nodejs#9856 Fixes: nodejs#10607 Fixes: nodejs#11104 PR-URL: nodejs#11094 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Sakthipriyan Vairamani <[email protected]> Reviewed-By: Santiago Gimeno <[email protected]> Reviewed-By: Michael Dawson <[email protected]>
Fixes: #10165 Fixes: #9856 Fixes: #10607 Fixes: #11104 PR-URL: #11094 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Sakthipriyan Vairamani <[email protected]> Reviewed-By: Santiago Gimeno <[email protected]> Reviewed-By: Michael Dawson <[email protected]>
Fixes: #10165 Fixes: #9856 Fixes: #10607 Fixes: #11104 PR-URL: #11094 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Sakthipriyan Vairamani <[email protected]> Reviewed-By: Santiago Gimeno <[email protected]> Reviewed-By: Michael Dawson <[email protected]>
Fixes: #10165 Fixes: #9856 Fixes: #10607 Fixes: #11104 PR-URL: #11094 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Sakthipriyan Vairamani <[email protected]> Reviewed-By: Santiago Gimeno <[email protected]> Reviewed-By: Michael Dawson <[email protected]>
Fixes: nodejs/node#10165 Fixes: nodejs/node#9856 Fixes: nodejs/node#10607 Fixes: nodejs/node#11104 PR-URL: nodejs/node#11094 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Sakthipriyan Vairamani <[email protected]> Reviewed-By: Santiago Gimeno <[email protected]> Reviewed-By: Michael Dawson <[email protected]>
On Windows, when a console window is closed, a CTRL_CLOSE_EVENT is sent to all processes running in that console. Applications can register handlers for this event, and all registered handlers for all processes in the console are run serially. After the handlers for a process finish, the process is automatically terminated by windows. Further, there's a timeout of five seconds. If the handler is not finished after five seconds, Windows also terminates the process. It's also worth mentioning that Windows creates a new thread in the target process and runs the handler functions in this new thread.
For further details about this see the documentation for SetConsoleCtrlHandler and HandlerRoutine.
libuv maps this event to SIGHUP, so when registering an event listener for SIGHUP on Windows, libuv sets up a handler for CTRL_CLOSE_EVENT. This handler emits a SIGHUP event, so that the registered listener is called. And after that, it calls
Sleep(INFINITE)
, which just causes the current thread to block forever.While this approach is understandable - as not sleeping would just kill the process - the current implementation causes several serious issues:
I have prepared a repository with scripts demonstrating the issues here: https://github.com/mika-fischer/nodejs-sighup-windows
First of all, once a SIGHUP handler is registered and a CTRL_CLOSE_EVENT received, it's no longer possible to terminate node properly. Probably the running thread somehow keeps the process alive. See on-SIGHUP-keeps-node-alive.js.
This is not completely catastrophic due to the fact that Windows will kill the process after five seconds. However this means that the SIGHUP handlers for all other processes in the console get delayed by 5 seconds, since they are called serially. Using nodejs cluster mode with 12 workers, it would already take a minute to shut down the system...
OK, so we can work around this by terminating ourselves after receiving SIGHUP, by calling
process.kill(process.pid)
, which works for killing us. But as a side effect, it will cause the next CTRL_CLOSE_EVENT handler in another process to be called. And if this process is already shutting down, this can lead to further race conditions and deadlocks:Once a SIGHUP handler is registered, it's no longer safe to exit the application normally. Even after the event loop is drained and node is shutting down, a CTRL_CLOSE_EVENT could arrive. Since the SIGHUP handler is still registered, a SIGHUP event will be emitted (but never received by anyone, since node has already shut down) and the
Sleep(INFINITY)
will kick in, causing the process to hang.See forktest.js.
OK, so we just need to remove our SIGHUP handler, right? Unfortunately, this leads to the next problem:
Once a SIGHUP handler is registered, it's no longer safe to remove it. For some reason, after receiving a CTRL_CLOSE_EVENT, trying to unregister the handler completely blocks execution of nodejs. This is the most serious issue here. See removeAllListeners-SIGHUP-after-SIGHUP-received-blocks-event-loop.js.
Taken together, this makes it completely impossible to properly shut down nodejs on Windows in response to SIGHUP/CTRL_CLOSE_EVENT. The only option is to always forcefully kill the node process once a handler for SIGHUP is set up and never trying to remove the handler.
I think a nicer behaviour would be:
Sleep(INFINITY)
is all that's keeping it running, kill the thread and terminate properly.The text was updated successfully, but these errors were encountered: