-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor communication layer to use socket.io #10514
Conversation
064ceca
to
0b64ccc
Compare
@paul-marechal Your changes over at https://github.com/eclipse-theia/theia/tree/mp/refact-connections look interesting. Are you interested in me taking some of these ideas to encapsulate the |
I want to look at how much we should wrap the broken APIs bits with abstract interfaces instead of either the old edit: I couldn't come up with something yet. |
I would like to point to #9514 (comment) and the lessons learned there. I think we should make sure that we get a real good separation of concerns: there are multiple concerns that we're addressing, like
I think we should take really good care to separate the different concerns into reusable, layered components that can be consumed separately. |
@tsmaeder In this case only the transport layer is touched, so we should try to keep at that in this PR and only minimally touch the other parts if required. |
@paul-marechal I agree as far as this PR is concerned. It's more a comment on future directions we should take. |
A propos the discussion in today's developer meeting, I think my inclination would be to merge this PR before we undertake further work on the messaging system. This PR cleans up the code a fair bit, and in some smoke tests, I didn't notice any regressions. I agree with @paul-marechal that the abstractions linking our front and back ends could use a fair bit of work - both to clean up the code and make things more navigable and to improve performance - but this PR is a step forward. |
It's just that Socket.io isn't directly related to WebSockets but it is now retrofitted into our various WebSocket-centric abstractions. Feel free to merge before we come around to break and refactor the communication APIs. |
That may overstate the case a little bit. As their website says:
So most of the time we will in fact be dealing with WebSockets, and they've also handled the cases when we aren't using a pretty slick interface that conceals the details. |
My main issue being that we used to rely on those details to some extent: We have validation logic for when WebSockets upgrade, but this will not always happen now. Socket.io also supports multiplexing out of the box, but right now we don't make use of that, and I believe we should. From the dependents of the communication API, using websockets or not shouldn't really matter. What the frontend wants are connections and proxies. How this is achieved should be an implementation detail, and if we better abstracted those use-cases I believe we could make a better use of Socket.io, or whatever other library could be used instead. |
But these two goals are at loggerheads: you can't rely on implementation details and abstract from them at the same time. If we want connections and proxies at a higher level, we need to have an abstraction at that level that captures the relevant properties. I could imagine that we're conceptualize this as a connection being multiplexed over a main and a fallback connection. Maybe we need some "quality-of-service" attribute on the connections to express events like: "this connection has switched from being slow to being fast". @paul-marechal I find it hard to know what exactly would be worse for the user if we lose the validation logic? Do you have a good handle on that? Might it be worth the sacrifice if we can improve the layering and simplify the code? |
I think clients to our messaging API shouldn't care about whether they are running on a main or fallback connection, that's kind of the point of using Socket.io. Lower layers of course will need to deal with the details, but that's the goal of the abstraction: To hide the nitty-gritty details and expose something easily usable.
Validation logic would be pulled down into the lower layers. Validation is required in absence of a solution to control the HTTP origin of WebSocket connection outside of Theia (with a proxy or whatever else) as WebSockets aren't affected by CORS.
Agreed. To illustrate some of the abstractions we would be missing: Right now if you want a proxy from the FE to the BE you have to go through |
I'd make a distinction here between missing abstractions and bad naming. The name |
I'm not sure what to think of https://github.com/eclipse-theia/theia/blob/master/packages/core/src/node/messaging/messaging-service.ts#L22-L47 in that regard, I find most of this infrastructure confusing hence why I would be looking into a way to simplify those things. |
FWIW from a user perspective: For what it's worth, I'm running a "Che Theia" cluster with people connecting from around the planet -- individuals on the opposite side of the planet (200-350ms) experience almost unusuable environments. The solution is a complete fail for them and probably reduces their working efficiency by a third or more. Reminds me of the TCP/BDP issues I had to solve trans-pacific 20 years ago (that have mostly been resolved by OS TCP stack improvements).
I may end up regionalizing their working environment, but it's worth pointing our their around-the-world video and voice connections are very reliable (rarely any disruptions during very long calls). They can also download directly from our http servers at better than 10mbit (despite the high BDP) no problem. Thanks for any multiplexing improvements that may make their life better, and please do tag me if I can help with any testing. It's possible you my be able to simulate at least some of the problem space by artificially adding 300ms to your RTT in the lab. I'm currently running Theia 1.22.1 from August 23 2021. |
@matthewfisch 250-300ms of what? Ping time? Round trip latency? I regularly would run on a che-theia cluster (Red Hat) from the US and Theia being slow was never a problem. Not saying it ain't so, but I'm trying to understand what scenario we're looking at. |
@tsmaeder Not all remote developers have reported the issue, and I'm still in diagnosis phase. I'm not sure if it is universal of affecting some individuals. I have traced the issue to websocket/rpc comms, but I don't know much about how it works in Theia (so investigation has been slow). I'm reviewing #9731 now, looks like http fallback is available in a Theia that is one-day newer and I may try to get this working. I presumed the issue was just that the BDP was a little too much and the RPC events were falling behind and eventually timing out. A multiplexed channel or multiple channels would resolve this. HTTP fallback may also help. |
Round trip time should be less of a problem than throughput as message handling is async anyway. Theia already uses a single Websocket for all front-end/back-end service communications. Looking forward to seeing your analysis. |
Yes and this is why it has been so challenging to diagnose, it shouldn't be
a problem, but it somehow is. Multiple different network providers,
workstations etc. Network tests show clean most of the time (despite clear
evidence of poor application level performance).
I will be upgrading Che-Theia today which should bring forward the http
fallback code into our environment. I will probably have user feedback by
the end of the week.
…On Tue, Feb 1, 2022 at 9:25 AM Thomas Mäder ***@***.***> wrote:
Round trip time should be less of a problem than throughput as message
handling is async anyway. Theia already uses a single Websocket for all
front-end/back-end service communications. Looking forward to seeing your
analysis.
—
Reply to this email directly, view it on GitHub
<#10514 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AM6E2A5UJT5VZMXPYCHVVLDUY7UM5ANCNFSM5JRIZG6Q>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Schedule a meeting with me: Book Meeting
<https://mail.fortmesa.com/meetings/mfisch>
Matthew Fisch, CISSP
Founder & CEO
FortMesa, Inc.
Delivering team security culture.
+1 518 444 4181 <+1+518+444+4181>
***@***.***
https://fortmesa.com/
[image: linkedin] <https://www.linkedin.com/in/matthewfisch>
Book a Meeting <https://mail.fortmesa.com/meetings/mfisch>
|
0b64ccc
to
e37a760
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes are minimal which is nice! Let's merge this now and look at improving our messaging abstractions later.
2299f29
to
915361a
Compare
Since 1.23.0 ( eclipse-theia/theia#10514 ), Theia uses socket.io and it needs /socket.io to support websocket, otherwise it falls back to HTTP polling, which is significantly slower.
What it does
Closes #10403
Replaces most of our communication layer (previously using simple websockets + http-fallback) with the socket.io framework.
This removes the need for a custom long polling implementation, which is done by socket.io automatically. It works by first initiating a long-polling connection and after a successful handshake tries to create a websocket connection. If that fails, it will simply continue using the long-polling connection. This greatly increases the performance of the http-fallback, which now seems to be on par with the websocket solution (performance wise). For more info on the actual protocol used here, see
socket.io-protocol
repo.There is some weird behavior in socket.io that strips
origin
headers for some reason. This is undocumented and seems to be used by the cors feature. I've worked around this for now using afix-origin
header that gets transmitted to the backend and can then be used for validation. Hopefully the comments around these parts explain it well enough.I've tried to keep the APIs as backwards compatible as possible. Hopefully this change shouldn't break too much.
How to test
Review checklist
Reminder for reviewers