Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server-Side Event "connection" not fired/called on Client-Connection (using wild-card namespaces) #4677

Open
s3cc0 opened this issue Mar 30, 2023 · 2 comments

Comments

@s3cc0
Copy link

s3cc0 commented Mar 30, 2023

Describe the bug
We use socket.io with Google Cloud Run with with redis-adapter for exchanging data across node-cluster and multiple containers. The challenge is that Google Cloud Run kills the connections every 60min, we don't use sticky extension as we work on websockets only ourselves. Thus do not support polling. When scaling the containers no matter in which direction, it happens from time to time that the client gets a socket connection, but from server the event "connection" is not called. As a result, all other events on the socket itself do not work.

Important: We use Namespaces and Channels

To Reproduce

  1. create a system with multiple servers and node-cluster
  2. use redis-adapter for exchange between all nodes
  3. just connect with the clients, simulate disconnects, or wait for normal disconnect from server (tcp)
  4. sometimes it works sometimes not, connection will not fired

Version for frontend and backend:
"@socket.io/admin-ui": "^0.5.1",
"@socket.io/redis-adapter": "^8.1.0",
"redis": "^4.6.5",
"socket.io": "^4.6.1",
"socket.io-client": "^4.6.1",

Redis Server 6.x+ (issue also with redis server 4.x+)

Server
Code Example

import { Server } from "socket.io";
/* ... */
// initialize socket io
const io = new Server(app.http, {
    noServer: true,
    cors: {
        origin: "*",
        methods: ["GET", "POST"],
        credentials: true
    }
});
const redisPub = activeRedisClient.duplicate();
await redisPub.connect();
await redisPub.ping();
const redisSub = activeRedisClient.duplicate();
await redisSub.connect();
await redisSub.ping();
io.adapter(createAdapter(redisPub, redisSub, {
    requestsTimeout: 3000
}));

const dynamicNamespace = io.of(async (name, auth, next) => {
    next(null, true);
});
dynamicNamespace.use(async (socket, next) => { next(null); });

dynamicNamespace.on('connection', async (socket) => {
   // this event sometimes not called
    socket.on('disconnect', async () => {
        // this event sometimes not added / called
    });

    socket.on('message', async () => {
        // this event sometimes not added / called
    });
});

Client

import { io } from "socket.io-client";

const socket = io("ws://localhost:3000/", {
    transports: ['websocket'],
    auth: {
        token: 'jwt-token',
    },
});

socket.on("connect", () => {
  console.log(`connect ${socket.id}`);
});

socket.on("disconnect", () => {
  console.log("disconnect");
});

Expected behavior
It is expected that the event "connection" is always called on the server. But this is not the case, so the clients can do what they want, they can not dive into the normal program landscape. However, the socket connection remains. So there is a socket connection without the event "connection" being called on the server.

Platform:

  • Device: PC, Notebook, Mac
  • OS: Windows, Linux, OSX
  • Browser: Safari (newest), Chrome (newest), Firefox (newest)

Additional context
Important, socket.io server running in google cloud run (docker container) and scale up/down up to the traffic, we had 250 connection at one container, a scale will happen at 150 open request.

@s3cc0 s3cc0 added the to triage Waiting to be triaged by a member of the team label Mar 30, 2023
@s3cc0 s3cc0 changed the title Server-Side Event "connected" not fired/called on Client-Connection (using wild-card namespaces) Server-Side Event "connection" not fired/called on Client-Connection (using wild-card namespaces) Mar 30, 2023
@darrachequesne
Copy link
Member

Thanks for the detailed write-up 👍

I'm not sure how this could happen though. The multi-node setup with Redis does not seems related, as the adapter is not called during the connection.

How do you detect this kind of issues? From the client side?

Related: #4015

@darrachequesne darrachequesne added needs investigation and removed to triage Waiting to be triaged by a member of the team labels Mar 31, 2023
@s3cc0
Copy link
Author

s3cc0 commented Mar 31, 2023

Hello, thanks for the quick feedback and the Related Bug, I didn't saw it in my research, sorry!

We found the bug when scaling containers and minimizing them again. In addition to that, it is noticeable by the forced removal of the TCP connection after 15-60min on the Google Cloud Run. After that, the reconnect from scoket.io client takes effect and a new websocket connection is established. However, this connection is established, but the event "connection" is not called, so no further listener like "disconnect", "message" ... are created at the socket. Because of this, the socket can't join rooms in the namespace. This became visible as the ws(s) connection was successful created, but no messages arrived (only the ping/pong heart-beat), using the socket.io admin UI we could confirmed this. there was the socket, but without rooms. I was able to add a room using the admin tool, then the socket get some messages. So the socket connection was there correctly, just the event "connection" from server was missing.

Related to the other bug, i can be something with the wilde-card namespace.

Conjecture:
It could be due to the speed of the reconnect? We have also "throttled" this, without any success.

How did we get that to work? Here is a simple example:

  1. create server nodes with node cluster, redis adapter.
  2. put all in a docker-self-scaling-system (google cloud run)
  3. also put socket.io admin UI on it to get more details
  4. start a node with thre process (1master,2-client-nodes) in a cotainer
  5. connect multiple clients to the nodes so that google cloud run has to scale up
  6. wait for the forced disconnect of 15-60min (can be setup in the env, tcp socket will be killed) and/or wait for the down-scale of google cloud run.
  7. randomly a client will get a socket connection, but the "connection" call will stay off and no channel will be joined anymore.

For all, with the same issue, here is a current workaround:

  1. Implement a client message, which will be called as soon as possible after the websocket connection established
  2. Implement the callback/acknowledge of socket.io
  3. if this is not coming in a defined duration like 5seconds, try reconnect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants