Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Potential new transport] Streamed xSSE Meek (HTTP/1.1) #3334

Closed
PoneyClairDeLune opened this issue May 5, 2024 · 16 comments
Closed

[Potential new transport] Streamed xSSE Meek (HTTP/1.1) #3334

PoneyClairDeLune opened this issue May 5, 2024 · 16 comments

Comments

@PoneyClairDeLune
Copy link
Contributor

https://github.com/ltgcgo/eclipsed/tree/main/src

Experimental bidirectional streaming HTTP/1.1 tunnels, built on top of extended Server Sent Events. This implementation may be faster than all existing Meek implementations. The example tunnels are currently crashing frequently however, I hate Deno.

Would like to have the opinions of the gurus here ;)

Original discussion: #3333

@PoneyClairDeLune
Copy link
Contributor Author

Here are my own tests with stdout redirected to /dev/null.

When run as a pure TCP relay of a web server on a mediocre server, Eclipsed Meek can reach a download speed of around 32 Mbps. When run as a TCP relay proxying a VLESS over WebSocket connection, Eclipsed Meek can reach around 22 Mbps in download, 10 Mbps in upload.

@PoneyClairDeLune
Copy link
Contributor Author

PoneyClairDeLune commented May 5, 2024

As a proof of concept, I believe Eclipsed Meek is enough to demonstrate the power of Meek. However, the current implementation can be made more efficient.

In terms of console logs, the current Meek still emits too many, which sometimes may overwhelm SSH sessions. JavaScript has its own limits as a non-compiled language, and SSE by itself isn't designed for transmission of large data chunks, as in the Eclipsed tunnel I had to transmit data with OVM43 codec (a faster but incompatible version of Base64, inspired by KORG 7 over 8 codec). Come think of it, the requirement of a codec might be both a blessing rather than a curse.

I'm working on another streamed Meek implementation in the meantime called Ditzy, built on top of binary messages rather than human-readable messages, but that may not come to fruition any time soon. I hope the gurus in this community can provide insights for the current Eclipsed Meek. And who knows, maybe Eclipsed might become a transport in Xray!

@PoneyClairDeLune
Copy link
Contributor Author

The current Eclipsed implementation cannot penetrate CDNs buffering request bodies, where the client and server would simply time out. Eclipsed itself only works in a streamed manner.

Maybe Ditzy (the next one with binary messages) will fare better with a streaming and buffering combo.

@mmmray
Copy link
Collaborator

mmmray commented May 5, 2024

The current Eclipsed implementation cannot penetrate CDNs buffering request bodies

The selling point of existing Meek implementations so far has been that it can pass arbitrary HTTP "middleboxes", not only CDNs but also CGI servers. If it can't be deployed there, then there is nothing the existing httpupgrade and websocket transports cannot do already.

Do you have a specific CDN in mind that does not buffer request bodies, and also does not support websockets? For this kind of constraint, your Meek implementation could be used.

@RPRX
Copy link
Member

RPRX commented May 6, 2024

@PoneyClairDeLune #2162 (comment)

下行流式上行分包(多次请求)

@PoneyClairDeLune
Copy link
Contributor Author

PoneyClairDeLune commented May 6, 2024

Do you have a specific CDN in mind that does not buffer request bodies, and also does not support websockets?

@mmmray No sadly. It just came to my mind when I was implementing SSE from scratch, wondering what would happen if SSE occurred in reverse. It just so happened that it had worked so well even in its current form, I entirely forgot CDNs don't allow streamed request bodies. :(

The problems with Eclipsed don't stop here. In Eclipsed's current form, connections may sometimes get overwhelmed due to bad design, and at least two unidirectional connections have to be established for each duplex connection, resulting in an absurd connection count between the client and the server.

But Eclipsed definitely shattered the myth of Meek being doomed to be slow!

Downstream streaming upstream subcontracting (multiple requests)

@RPRX Actually I had (subjectively) better ideas. The current Eclipsed experiment certainly served as a rewarding experience regarding unidirectional streaming, and while it's my first attempt at implementing Meek, it isn't my first design.

Since Meek has everything regarding the states of connections, I suspected that a full state decouple from underlying transports would be possible, so I designed the binary message format for a draft called Ditzy. If implemented well, Ditzy would work both in a bundled manner and a streamed manner whenever it finds itself capable, and the state of reconstructed connections would be fully decoupled from the underlying HTTP requests altogether, resulting in a much more efficient use of HTTP requests; multiplexing, if you will. You might be able to use it on DNS too.

But right now Ditzy is just a draft, all those advantages currently are just pipe dreams. After I get the binary messages working for request streams, I'll work on bundled requests (one request/response bundling multiple messages at once), possibly also biased stream multiplexing.

@mmmray
Copy link
Collaborator

mmmray commented May 19, 2024

I am increasingly convinced that HTTP SSE/streaming for download, and individual HTTP requests for upload, just like RPRX said, is the way to go. With as little extra complexity as possible. No extra mux layer, no binary protocol of any sort. No duplication of what vless or trojan already do.

In my mind the purpose of such a transport is not to compete with websocket in terms of performance, it just has to unlock as many IP ranges for proxy use as possible. This is a last-resort protocol. And there are many questions unanswered to how this protocol will perform in practice that are beyond the reach of protocol design:

  1. people are interested in using Meek behind PHP webhosts and lesser-known CDNs. do either of those allow you to send this many requests, or do they immediately become the bottleneck?
  2. will this trigger abuse detection? some PHP webhosts have DDoS protection that cannot be disabled.
  3. does HTTP keepalive work well enough on those CDNs to offset connection overhead?

I'm asking those questions because I have some experience with tunnelling over PHP webhosts, and in my experience it's more difficult to achieve long-term stability than it is to achieve throughput. I ran into all of the above mentioned issues at least once.

If this turns out to be a popular protocol, then one can consider optimizing its performance, and until then, it should be optimized for simplicity and ease of studying. If you decide to open a PR with this proposed transport, you front-load a lot of complexity on the core developers to review, with no proof of the protocol being actually useful in the field.

(Maybe add something like Nagle's algorithm to RPRX's suggestion, so that uploaded data is batched together into larger HTTP requests, but other than that...)

@PoneyClairDeLune
Copy link
Contributor Author

PoneyClairDeLune commented May 20, 2024

... just like RPRX said

@mmmray I think it would be better to allow the clients to configure how to initiate connections, and let the server handle all the rest (enabled by the byte stream protocol). And split requests for upload, streamed responses for download should be the default option for a balance between compatibility and performance.

No extra mux layer, no binary protocol of any sort.

The extra multiplexing is in place to reduce the number of actual connections, as raising two times the connections you get on a non-multiplexed transport might prove extra suspicious. There also exists implementations like Sing Box who refuses to integrate Mux.Cool. However, I agree on being too complex, as such it's been on the back of the draft sheet for ages. Maybe someone can integrate a tried-and-true multiplexing algorithm into Ditzy Meek.

A binary protocol, however, is necessary. And it is designed to be as simple as possible, to offer enough performance and flexibility while avoiding possible attacks on the server.

such a transport is not to compete with websocket in terms of performance

I agree on not competing with existing transports for performance, however the performance itself must be acceptable for general use. I'm hoping to achieve at least 1 Mbps on upload, and at least 16 Mbps on download in the first public Ditzy proposal.

And there are many questions unanswered to how this protocol will perform in practice that are beyond the reach of protocol design:

I currently have no answer to any of the PHP-related questions raised, as I have absolutely no experience with it. However, I believe an open response stream is good enough to offset the connection overhead introduced by request buffering on CDNs.

If you decide to open a PR with this proposed transport, you front-load a lot of complexity on the core developers to review, with no proof of the protocol being actually useful in the field.

I'm not sure if I can implement everything in Go as I wanted just yet, so PR might not be soon. But just like this one, I will always provide PoC implementations for everyone to test for themselves and come to their own judgements.

Will it be useful in the field? If it could be made into a transport with an acceptable performance while able to penetrate CDNs, I believe it will be.

Maybe add something like Nagle's algorithm to RPRX's suggestion

Message batching is absolutely required for any Meek implementation, or it would be no better than you connecting to the Internet via a dial-up...

Looks like I found out about Nagle's algorithm before I knew its name, however some modifications are required, like allowed maximum body for certain middleboxes (relaying serverless functions, certain CDNs, etc).

@uuonda
Copy link

uuonda commented May 21, 2024

Don't know much about xSSE but there is also GOST PHT that uses a pair of pull/push endpoints via HTTP long polling I think. Still, even that is incompatible with many CDN.

@mmmray
Copy link
Collaborator

mmmray commented May 21, 2024

I have written a HTTP tunnel now in Rust here. setup is like this:

  1. for each new TCP session, client rolls a UUID. client opens streaming HTTP response at /<uuid>/down for downloading data, this initiates the session. if the connection is closed, the tunneled connection dies as well. this also means that it's hard to exhaust memory by sending many UUIDs, because the lifetime of a "session" is directly tied to a TCP session.
  2. clients sends individual HTTP requests to /<uuid>/up, with up to 120kB of data

it passes some PHP webhosts, but i have not tried all of them. it does 300 Mbit download (exhausting my bandwidth), and 30 Mbit upload (not exhausting my bandwidth), through that PHP webhost. I use this htaccess to proxy:

RewriteRule ^(.*)$ http://upstream.com/$1 [P]

of course, the anti-censorship properties are very weak, but since up/down traffic is split, i think the detection would have to be separate from existing detection methods.

@mmmray
Copy link
Collaborator

mmmray commented May 21, 2024

Still, even that is incompatible with many CDN.

I found that many CDNs and apache installations buffer response bodies, but it can be fixed with the response header X-Accel-Buffering: no

EDIT, when you say "long polling" do you mean streaming request or response? i find that streaming responses are widely implemented (but have to explicitly enabled with special headers or htaccess config) but streaming requests are extremely rare

@PoneyClairDeLune
Copy link
Contributor Author

Still, even that is incompatible with many CDN.

I'd be surprised if that passes any CDN at all. CDN expects clients to have data fully ready upon request initiation, long polling simply will not work.

@PoneyClairDeLune
Copy link
Contributor Author

PoneyClairDeLune commented May 21, 2024

I found that many CDNs and apache installations buffer response bodies

@mmmray From the CDNs I have observed Fediverse instances use, all seem to have SSE implemented properly. Guess apart from X-Accel-Buffering, setting Content-Type to text/event-stream can also enable streaming.

@uuonda
Copy link

uuonda commented May 21, 2024

when you say "long polling" do you mean streaming request or response?

Just responses. I'm definitely going to try X-Accel-Buffering trick and see if it helps. Although a well known orange CDN doesn't require any additional headers for that.

I'd be surprised if that passes any CDN at all

It's just for pulling data. Aside from buffering issues I'd be really surprised if there are middleboxes not supporting it. I'm pretty sure long polling existed even before XHR was adopted. It still works even over "incompatible" CDN, just very slow.

@mmmray
Copy link
Collaborator

mmmray commented Jun 29, 2024

I think this issue and #2162 #3333 are "done". The "splithttp" transport implements some of these ideas and ultimately serves the same purpose as meek. Since there are a lot of developers in this thread who wanted to prototype and design, here's some major things that can be improved:

  • The significant difference to OP's proposal is that upload is chunked, which harms upload performance in a noticeable way. I think this can be subject to future research. Maybe there can be a setting to trade some compatibility for better upload, but so far I did not have any luck to find a useful design for CDN.

  • The server implementation itself should eventually be improved for performance, independently of the protocol. If somebody wants to spend time on it, upload_queue.go should be reimplemented and benchmarked. I have a suspicion that golang's "heap" is completely unnecessary here since there are only maxUploadSize=10 items, and that a simple array + insertion sort would be more efficient for the average user.

  • Splithttp supports h1, h2, and with Add browser dialer for splithttp #3484 eventually h3 (in a convoluted way through the browser, but whatever). This could be improved further by offering direct h3 support, and I left some thoughts on h3 and ALPN in [Feature-Request] splithttp using http3 #3456

@mmmray mmmray closed this as completed Jun 29, 2024
@PoneyClairDeLune
Copy link
Contributor Author

PoneyClairDeLune commented Dec 2, 2024

Should've never demonstrated xSSE as a potential Meek-like solution before it's ready...

Edit: typo lol

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants
@RPRX @PoneyClairDeLune @mmmray @uuonda and others