-
Notifications
You must be signed in to change notification settings - Fork 7.3k
Connections closed by node stay permanently in FIN_WAIT2 #3613
Comments
I've never seen that behavior but it sounds plausible. We sometimes shut down the socket before closing it. Maybe there's a code path somewhere that shuts down but fails to close the file descriptor. What do you see when you run |
Thank you for your feedback. I will provide you this info next week, because my co-programmer went on holiday today. |
If I understand the strace below well, node only does a shutdown when calling socket.end(). If the assumption is true, that "the connection will remain in the FIN_WAIT2 state indefinitly" when doing a shutdown instead of a close, then a simple DOS attack can exhaust a server (calling a node server many times, let node disconnect, but with iptables blocking sending a return FIN). strace -e shutdown,close -p 19751Process 19751 attached - interrupt to quit A temporary workaround is to enable keep-alive when the connection was opened, because then the FIN_WAIT2 will timeout after 2 hours. Normally it should be 1 minute, but this is better than infinite. |
Do you see corresponding close() calls, i.e. a close(42) some time after shutdown(42)? I guess the answer must be yes or you'd run out of file descriptors eventually. |
No shutdown after close of visa versa:
|
I see at least a few matching shutdown/close syscalls and, realistically speaking, it must happen or you'd run out of file descriptors before too long (unless you set Do you perhaps have a test case I can try? |
I have set the open files descriptors very high indeed (I am handling close to 100.000 M2M devices). This is the perfect test case for node, because with mobile devices everything can go wrong, like disappearing without closing the connection (resulting in FIN_WAIT1), not replying to a FIN from the server (resulting in FIN_WAIT2). The last list I sent to you is the strace from the start of the node application for 15 minutes and the netstat is taken at the end of the 15 minute sample. Is there a way to check the number of opened file descriptors in node, so I can check there is a "leak"? I guess three test cases would be:
|
Yes, Can you check with |
Today node crashed because I was out of file descriptors. I noticed that there is no leak. The number of fd is as expected. I noticed that if keepalive is set, it takes tcp_keepalive_time instead of tcp_fin_timeout when in FIN_WAIT2. So by default it is 7200 seconds (2 hours) even if tcp_fin_timeout is set to the default 60 seconds. If however keepalive is NOT set, then the socket goes into FIN_WAIT2 infinitely when the socket is not closed properly from the client side. This happens when node does first shutdown, then close. In my opinion this is a bug in node which can be exploited easily by a DOS attack. Today node crashed (100% CPU) when it ran out of fd. The reason is that a lot of M2M devices that I manage don't return a FIN when node closes the connection. I tuned some parameters in /etc/sysctl and turned on socket keepalive. Now I have about 10.000 sockets in the FIN_WAIT2 state, waiting to time out. So this is a good workaround, but still sensitive for DOS attacks. |
Okay, thanks. That means it's not specifically a node bug (by which I mean that any TCP server is vulnerable to this particular weakness of the protocol, not just node.) I'm not sure how or even if we should fix this. The "Well, that's a bug in the client application," you say (and you'd be right). Yes, but you don't have to sift through the myriads of bug reports that changes like these generate. :-) I half-kid, obviously, but you see the dilemma - it's going to break stuff that worked before (even if we make it opt-in - because there's always someone.) |
Thanks for your feedback. I am not an expert in how to exactly 'end' a connection, but when I opened this issue I referred to one of the sources I found, saying you should always close a socket, not shutdown a socket. As far as I know, the other side does not know if you did a shutdown or close because in both situations a FIN is sent. And a FIN is expected back (and if not, no problem because it will timeout in FIN_WAIT2). Since allowHalfOpen: false is the default setting, maybe then you can close the socket, otherwise shutdown? I love node.js and I just want to contribute a potential problem which can be exploited :) |
I discussed it with other committers. The consensus is that removing the shutdown() if allowHalfOpen is not set is the best way to go forward. The problem is that it breaks a fair number of tests - no doubt it'll affect real world code as well. Ergo, don't expect it to land in a stable branch anytime soon - v0.10 at the soonest. |
About,
|
So I propose following solution to this problem: joyent/libuv#498 . And we may add |
Is this a duplicate or similar problem to #664? |
Hi @bnoordhuis, we facing the same problem... I've studied the new This issue is very serious for us because lots of 3rd party node.js modules use Pls give it more attention..., tks a lot! |
With a lot study on With this change, I've passed nearly all tests except 3 relate to |
I don't quite understand what you mean by that. What difference does it make? If you don't allow half-open connections, you might as well close the connection outright. (Note: I'm quite familiar with libuv and the kernel source; you can just point me to the relevant bits if you want.) |
@bnoordhuis tks for quick reply.
Errrh.., I mentioned before, we use lots 3rd party libraries, such as socket.io, ws. These libraries all use If I force close these libraries' management sockets by self, I need enum all defunct connections and So, I want to fix this in node.js core. The real problem is I tried place a So, place When study on Use Yeah, this new behavior should only apply on TCP connection and (Note: lots library use (Ps: According http://lxr.free-electrons.com/source/net/ipv4/tcp.c#L486, when (Ps2: I'd tested on tks! And pls take a look at my change-set: https://github.com/HoopCHINA/node/commit/2a70ccbbeb80475272d56d415edc9167cab42f4e . |
@bnoordhuis With more discuss with my colleagues, we found more compatible way to fix this problem. As @gitHob suggests, the TCP keep-alive function is more suitable of this prob. The real problem is terrible network-link, not node/application code. So the keep-alive/timeout solution may more simple and suitable. Pls take a look at https://github.com/HoopCHINA/node/commit/01b982d4e7c9fae98d1b59be46a81e4b8a699e7e changeset. The only change on node v0.8.16 release is tks a lot! |
Thanks for the discussion. I see my suggestion of using the TCP keep-alive timeout as a work-around, although I tend to agree that technically @bnoordhuis is right about that it is not a node issue. But the beauty of node is for example that you can write to a socket and when the buffers are full (not a node issue), it automagically buffers in user space (of course you need to be careful not to overflow that). I run node with 100.000+ mobile connections and FIN_WAIT2 is really an issue (not a node issue). The beauty of node could maybe apply here as well, so that it automagically destroys connections when appropriate. I wrote "could" because the discussion between @soplwang and @bnoordhuis is too technical for me, so I don't know the implications. Thanks |
Hi @gitHob. Actually, I consider it's a node issue. Other node's stream use Keep-alive is the simple solution to face bad network-link. We even can use @bnoordhuis We consider after On streams2, the direction "If allowHalfOpen is false, ...then destroy, instead of shutdown" of Ps: Without fix this in node core, we current use monkey-patch to force enable keep-alive after shutdown to solve this problem. Ps2: Pls take a look at https://github.com/HoopCHINA/node/commit/4df0c77c35e741ef0f3b06dd277d37491aad6315 and https://github.com/HoopCHINA/node/commit/c3e3a6707e9cb7c30ea144cae2bbfbabc2e5ab46 changeset. Thanks. |
@soplwang you can try something like: jameshartig/libuv@4f1098f and jameshartig@34562e7 |
Hello, We have the same issue here. I wanted to patch our nodejs server with HoopCHINA patch, but it has been deleted. Can someone tell me how to fix this issue until the official files are updated please ? At this time, I have to reboot our server every week to cleanup the FIN_WAIT2 and I can only hope that nobody will figure how to DDOS us with a single computer... I don't understand why this critical issue isn't fixed yet tbh, I'm having a hardtime convincing my boss not to switch to another tech like java. Why this issue isn't affecting more people ? Do people just set a stupidly high amount of file descriptors ? I don't want to do this, there is a limit for a reason. Thanks for your help ! |
Try 'node-ka-patch' package from HoopCHINA folks. 发自我的 iPhone
|
This issue only affect on bad networks like mobile. But real reason is many packages use 'socket#end()' instead 'socket#destroySoon()' to close socket stream but without a timeout. Not node's issue, Exactly. 发自我的 iPhone
|
@JiDW I'm having this issue too! So frustrating. I have a beautiful framework built that cannot be moved forward until this is fixed. My symptom is 100% CPU while it tries to talk to the unclosed FDs. Is there any update on this? I suppose I'll look at node-ka-patch. Hopefully someone will reply to my detailed ticket in sockjs/sockjs-node#161 ? update: the patch did not work for me. Perhaps it's not the exact same problem, but it sounds quite similar! FIXED: I humbly admit that I did not upgrade node.js sooner. I was naive to think v0.10.5 was not that far out of date. Wrong. By over a year. An upgrade to v0.10.29 fixed my issue of unclosed FDs. |
I can confirm that this problem still occurs for the rest of us in v0.10.29. It seems likely that, if these devices can accidentally leave open files by dropping off the net at the wrong moment, an attacker could do it on purpose in order to DoS any node application using socket.end(), which seems like a very high-priority problem, being left open for more than two years now. |
@yorickvP from what I do remember, the only way to make the sockets really die is to set linger=0 on them. But this means really non-graceful connection. Here is the code from nginx that does it: linger.l_onoff = 1;
linger.l_linger = 0;
if (setsockopt(r->connection->fd, SOL_SOCKET, SO_LINGER,
(const void *) &linger, sizeof(struct linger)) == -1)
{
ngx_log_error(NGX_LOG_ALERT, log, ngx_socket_errno,
"setsockopt(SO_LINGER) failed");
} I think we could possibly call this on connections that has emitted ETIMEDOUT, or even do it in libuv? cc @saghul |
This seems like a good compromise. |
I'll take a look tomorrow. |
I took a quick look at this, and well, looks like setting SO_LINGER to zero would actually solve this particular problem. Funny enough, windows has SO_DONTLINGER, which does exactly what you think it does. I would only expose setting SO_LINGER to zero however, and using some application level timer, because apparently a non-blocking socket with SO_LINGER set to some value would block on close(). Maybe dome Now, I guess that the following could be done for node's socket.end() (take it with a grain of salt, my node-fu is not so strong).
Of course, I might be missing something super-obvious here :-/ |
Here is some interesting read on the matter by Bert Hubert, author of PowerDNS: http://blog.netherlabs.nl/articles/2009/01/18/the-ultimate-so_linger-page-or-why-is-my-tcp-not-reliable |
@saghul Not suggest linger, but leave it by kernel TCP retries and timeouts. My experiment of this issue of Node is If we use Ref1: FIN_WAIT2 [https://kb.iu.edu/d/ajmi] |
@soplwang what I suggested is to use proper shutdown() together with an application level timeout, which would then result in close() being called with no lingering, thus (hopefully) guaranteeing that the socket is immediately closed and won't remain in FIN_WAIT2 state. |
@saghul Yeah…, actually I set application level timeout as sub-optimal solution of that comment… This issue is about link-level problem (e.g., mobile network…), so I suggest solve it with link-level solution, i.e., TCP keep-alive. Maybe we don't need fix this in Node but mark TCP stream unreliable and leave this to module/app developers. For old problematic module/apps, a keepalive monkey-patch may sufficient. Ref: Quick keepalive patch [https://github.com/soplwang/node-ka-patch]. |
@trevnorris @saghul ... do either of you happen to recall the status of this one? |
Thanks guys. I am recently writing a tcp client and as i dive into TCP details i found this issue. I wonder if this is fixed in modern nodejs versions, so i move this issue to nodejs/node#11572 for further discussion. I'd be very happy to hear from you. |
Hi
I love using node :)
Normally when node closes a connection (using socket.end()), the following packets are exchanged:
node > client [AF]
client > node [A]
client > node [AF]
node > client [A]
However, sometimes the client does not respond well, and I get only these packets:
node > client [AF]
client > node [A]
With node, this results in connections permanently (not temporarily) in FIN_WAIT2 (netstat -nop):
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name Timer
tcp 0 0 XX.XX.XX.XX:80 XX.XX.XX.XX:17863 FIN_WAIT2 26375/node off (0.00/0/0)
tcp 0 0 XX.XX.XX.XX:80 XX.XX.XX.XX:17832 FIN_WAIT2 26375/node off (0.00/0/0)
[very long list]
tcp 0 0 XX.XX.XX.XX:80 XX.XX.XX.XX:17992 FIN_WAIT2 26375/node off (0.00/0/0)
tcp 0 0 XX.XX.XX.XX:80 XX.XX.XX.XX:17851 FIN_WAIT2 26375/node off (0.00/0/0)
The problem here is that the timer is set to off, so the socket stays indefinitely in this state, eventually exhausting the server.
I wanted to know why I see this problem only with node (and not with nginx and perl POE). I read the following interesting information (source http://alldunne.org/2010/07/tcp-connection-is-permanently-in-fin_wait2-state/):
"When programming sockets in C, it’s useful to know an important difference between the shutdown (either READ or WRITE) and close of a socket.
Shutting down the socket will cause the connection to go into the FIN_WAIT2 state while the TCP/IP stack waits for the connection to be closed at the other end of the connection. But if the other side doesn’t shutdown/close the connection, then the connection will remain in the FIN_WAIT2 state indefinitly (assuming of course that your process hasn’t been terminated).
If your process where to close the connection, rather than call shutdown on it, then the connection will fall into FIN_WAIT2 as before, however, the TCP/IP stack will time the connection out after a certain period."
Having read this, maybe node is shutting down the socket instead of closing it and causes the socket to exist forever, when it was not closed well on the client side.
Best regards
The text was updated successfully, but these errors were encountered: