Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebSocket connection failures / flakes #122

Closed
ekzhang opened this issue Feb 12, 2025 · 3 comments · Fixed by #123
Closed

WebSocket connection failures / flakes #122

ekzhang opened this issue Feb 12, 2025 · 3 comments · Fixed by #123

Comments

@ekzhang
Copy link
Owner

ekzhang commented Feb 12, 2025

Since today, users have been reporting issues where WebSocket connections fail repeatedly until eventually succeeding. You run sshx, get a link to your remote terminal, and then it loads. But then the WebSocket connection that starts on the page takes up to 5-20 attempts before finally succeeding. A screenshot is below.

Image

I can’t reproduce this on local hardware, even running the exact same software. I also went into the nodes and checked all their network connectivity looks good. Which makes me think this is an issue on the Fly Edge level in the ALPN / HTTP/1.1 / HTTP/2 handlers that I’m using.

Submitted an issue to the Fly.io forums for now, will see if I get a response. https://community.fly.io/t/issue-with-flaky-websocket-connections-in-sshx-io/23864

@ekzhang
Copy link
Owner Author

ekzhang commented Feb 12, 2025

Okay yeah it seems to have started in sshx 0.4.0, after a bunch of changes. However, I can't use 0.3.1 long-term because Fly.io is deprecating their remote builders, and the build fails on their new Depot system. So I'll need to investigate some more. Can't reproduce locally, still…

@ekzhang
Copy link
Owner Author

ekzhang commented Feb 12, 2025

oh geez, you know what the issue might be? tokio-rs/axum#2894

@ekzhang
Copy link
Owner Author

ekzhang commented Feb 12, 2025

added: Add support for WebSockets over HTTP/2. They can be enabled by changing get(ws_endpoint) handlers to any(ws_endpoint) (tokio-rs/axum#2894)

Lol you're kidding me, I can't believe that's actually the error. Axum added support for websockets over http/2 in version 0.8.0, but the issue is that their server started advertising support for all endpoints even though you had to opt-in by changing get() into any(). So this broke sshx. It was only detectable in prod since all tests and local dev uses http/1.1 for web requests.

This actually kind of took me on a crazy wild goose chase. I was about to start busting out Wireshark if I couldn't find the solution in a couple more hours. …anyway obviously very happy that axum added support for this protocol (exciting!) but geez that was a tricky issue.

ekzhang added a commit that referenced this issue Feb 12, 2025
This resolves #122 after a change in Axum 0.8.0. See the issue for an explanation of what happened.
ekzhang added a commit that referenced this issue Feb 12, 2025
This resolves #122 after a change in Axum 0.8.0. See the issue for an explanation of what happened.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant