-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
handshake propagation issue when using cluster #952
Comments
Manager.prototype.handleUpgrade = function (req, socket, head) {
var data = this.checkRequest(req)
, self = this;
if (!data) {
if (this.enabled('destroy upgrade')) {
socket.end();
this.log.debug('destroying non-socket.io upgrade');
}
return;
}
req.head = head;
// HOT FIX
setTimeout(function() {
self.handleClient(data, req);
}, 1000);
// ORIGINAL this.handleClient(data, req);
}; If we introduce an artificial delay during Upgrade as above, it gives the 'handshake' events enough time to propagate, and the "client not handshaken - should reconnect" errors go away. I'm not in any way suggesting this as a fix, just using it to illustrate the issue better. |
I'm using nodejs, socket.io 0.9.6, nginx patched with tcp_proxy module and redis for scaling socket processes. Now I'm got stuck with situation similar with yours. Client could not "handshake" with server (but sometimes successfully!) and in log file I see: debug: websocket writing 2:: Client sent request connect which is not success, so he repeatly send requests again!. Very appreciate if you could give me some advice. Thanks. |
@guille @LearnBoost Are there any plans to address this issue? |
+1 blocker |
Does anyone know if the future to come Socket.io 1.0 still has this issue? |
+1 |
I'm using node 0.8.1, socket.io 0.9.6, websockets and the cluster module. The Redis module uses pubsub to communicate handshaking events across processes, but looking at the code, it might not behave correctly under heavy load due to timing issues.
I've experienced the following problem trying to replace Redis with RabbitMQ, but as far as I can tell the problem is timing related and independent of what pubsub tool we use.
Scenario:
Let's say there are two worker processes in the cluster: W1 & W2
The initial request to allocate a client session/websocket (say /socket.io/1/?t=1341994956158) comes to W1, which updates it's list of 'handshaken' clients. It also publishes this handshaking event for other processes to update their lists.
Due to clustering, W2 receives the HTTP Upgrade request (say /socket.io/1/websocket/1860678371557773727 ) before it gets the 'handshake' event published by W1.
W2 doesn't find 1860678371557773727 in the list of 'handshaken' clients, and discards the transport with a "client not handshaken - should reconnect" error.
During the reconnect tried by the browser, the same story repeats (with workers interchanged), leading to the browser failing to establish a websocket connection with the server even after multiple retries.
If the 'handshake' event sent by W1 reaches W2 before the HTTP Upgrade request, everything seems to work fine.
Has anyone faced this or similar issues? Or, am I missing something?
The text was updated successfully, but these errors were encountered: