Performance rework #22

Vagabond · 2023-05-12T14:31:00Z

This PR is a collection of changes to improve the reliability and performance of chatterbox so that it can be used in a high performance context. There are a lot of changes here (and a lot of commits), which reflects the nature of how the branch was developed. Depending on the preference of the reviewer we could break this branch up into smaller PRs (although for many changes this may prove difficult). Certainly the commit history should be squashed, although I am currently leaving it for reference.

Now, I'll provide some information about the issues we encountered and how I resolved them. We are running a high throughput grpc (using grpcbox) service that uses a lot of persistant streams and unary requests. We would see the h2_connection back up under load with large mailboxes, and we'd also see http/2 connections get closed with no explanation under load.

Inspecting how chatterbox works, I learned that the h2_connection handled almost all of the http/2 message processing and routing, as well as the socket reading. This results in an incast condition where a connection with many streams will put far too much load on the h2_connection process, resulting in an effective stall of the connection.

To fix this, I made the h2_stream_set (the structure that tracks all the streams on the connection) into a record holding an ETS table, a set of atomic counters, the http/2 socket, the pid of the connection, etc. I then provide this stream_set record to all the streams and allowed the streams to make changes to the stream set themselves.

I also moved the socket recv out of the connection process into a dedicated sub process that sits in a tight receive loop reading and dispatching incoming http/2 frames. Almost all frame handling is done in this process, and incoming DATA frames are routed directly to the corresponding stream pid, bypassing the h2_connection process entirely.

Similarly, all the socket sends are done by the process wanting to send, rather than serializing them via the connection process (this relies on the assumption that a tcp/ssl send() is atomically sent and not interleaved with other sends).

Some http/2 protocol design issues have some concurrency limits that have been addressed:

Hpack compression information is assumed to be shared by all streams, so I've patched hpack to allow me to detect when new headers are included in a proposed header/trailer, and that process then takes an exclusive lock of the hpack structure, encodes the new headers and sends them on the socket before releasing the hpack back to the connection.
Outbound streams are still opened by the connection process, this is because http/2 requires sequential stream opening (if HEADERS are seen for stream 6 before stream 4, stream 4 is considered closed and unusable).
SETTINGS updates need to be applied with some care, thus these also are locked while a SETTINGS update is being processed.

Another issue, that we believe was causing our connection drops under load, was that chatterbox would allow streams to be created independent of headers being sent on that stream. Grpcbox used this, and under load I believe that messages to create streams and send headers would be interleaved in the connection's mailbox, leading to headers being sent on a "closed" stream, which is treated as a fatal connection error. I've removed the APIs that allow a stream to be created without headers being sent now, and thus stream creation and headers being sent are now both done atomically.

Stream GC also appeared to be an issue under high stream counts. I was unable to see why the GC was done the way it was and so, in an effort to reduce memory usage, I simply delete streams marked garbage immediately. Everything still seems to work with that change, so I left it in (interestingly there appears to be no way to set garbage_on_end on the server side of a connection).

There are a few unresolved issues:

A test fails (http2_spec_8_1_SUITE ==> sends_second_headers_with_no_end_stream). I haven't figured out why.
There's a significant amount of system memory being used that I don't fully understand

That said, this changeset allows us to run our grpcbox service at much higher scales, more reliably, with less CPU usage and so we are running this in production currently. I will update this PR if we find other issues under load, or determine the source of the memory usage.

add rst_stream api and make it a cast

…e-condition Andymck/fix trailers close race condition

tsloughter · 2023-11-13T11:04:50Z

Interesting, if I change the req and resp sizes in the tests to small numbers, all tests pass.

So something about "large" payloads (just like 270 kb) is somehow broken.

Vagabond · 2023-11-13T14:18:10Z

maybe some of my buffered receive code is broken?

tsloughter · 2023-11-13T14:22:48Z

Ah, that'd be a good place to start. I hope to dig in some today.

Vagabond · 2023-11-13T14:33:56Z

Narrowing down the exact size that triggers it might be helpful

tsloughter · 2023-11-13T14:36:04Z

32753 bytes which is roughly 2x the configured max frame size.

tsloughter · 2023-11-14T11:04:00Z

I tried increasing the max frame size and that didn't do anything, so likely unrelated.

tsloughter · 2023-11-15T15:09:13Z

Interesting/weird thing I've noticed so far when tracing whats going on, when the request is large it is sending it twice (headers and all, not just the body).

tsloughter · 2023-11-15T15:11:33Z

Meh, no that isn't the case.. I need more sleep.

tsloughter · 2023-11-16T10:24:19Z

Heh, modified the echo test to send a large body and added a print of the frames the client gets back:

Frames = http2c:get_frames(Client, 3),
ct:pal("Frames ~p", [Frames]),

Frames [{{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,60}},
        {{frame_header,1,1,4,3},{headers,undefined,<<136>>}}]

tsloughter · 2023-11-16T10:26:30Z

Meh, didn't change the window size in the test.

With the window size updated its just:

Frames []

tsloughter · 2023-11-16T10:55:30Z

That too was a misconfigure. Setting the frame size on the client to not be over the size on the server I see the server handling the body and then at least attempting (I see the calls to h2_connection:send_body) to send a response but the client only getting:

Frames [{{frame_header,4,8,0,3},{window_update,16384}},
        {{frame_header,4,8,0,3},{window_update,16384}},
        {{frame_header,4,8,0,3},{window_update,60}},
        {{frame_header,1,1,4,3},{headers,undefined,<<136>>}}]

tsloughter · 2023-11-16T11:54:55Z

@Vagabond ok, maybe getting somewhere. I got it to work by basically forcing it to send (apply the stream actions) here:

case update(StreamId, fun(Stream0) -> StreamFun(StreamFun0(Stream0)) end, Streams) of
        ok ->
            NewSWS = socket_send_window_size(Streams),
            NewSWS;
        {ok, {BytesSent, _OldStream, Actions}} ->
            NewSWS = decrement_socket_send_window(BytesSent, Streams),
            case BytesSent > NewSWS of
                true ->
                    %% we delved too deep, and too greedily
                    %% try to roll things back

                    %% ets:insert(Streams#stream_set.table, StreamFun0(OldStream)),
                    %% SWS = increment_socket_send_window(BytesSent, Streams),
                    %% SWS;

>                    apply_stream_actions(Actions), 
>                    NewSWS;
                false ->
                    %% ok, its now safe to apply these actions
                    apply_stream_actions(Actions),
                    NewSWS
            end
    end.

Vagabond · 2023-11-16T14:25:40Z

Oh, hmm. That's interesting. Aren't we violating the spec there? This is saying, due to a stream race, that we can't send as many bytes as we wanted to. How many streams were active on this session? Does the client agree on the window sizes, or do we have an accounting bug?

tsloughter · 2023-11-16T14:31:17Z

There is only 1 active stream.

Do you mean violating the spec with my change? Because, yes, it is just to debug and find where things might be going wrong.

My guess would be an accounting bug. It is attempting to send everything (I see the h2_connection calls) its just not getting sent out on the socket.

Vagabond · 2023-11-16T14:43:31Z

Is there a way to see what the client thinks the window size is, or why the server decided it oversent? Maybe you can print BytesSent and NewSWS in the clause there to see how much they disagree?

Can you add a unit test to the test suite for this?

Also, do we know if this bug happens with a different HTTP/2 client?

tsloughter · 2023-11-16T14:50:32Z

Yea, I can do all that in a bit. And yea, I first encountered the issue when doing interop tests with Go's grpc client.

…

On Thu, Nov 16, 2023, at 07:43, Andrew Thompson wrote: Is there a way to see what the client thinks the window size is, or why the server decided it oversent? Maybe you can print BytesSent and NewSWS in the clause there to see how much they disagree? Can you add a unit test to the test suite for this? Also, do we know if this bug happens with a different HTTP/2 client? — Reply to this email directly, view it on GitHub <#22 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAI3A3ZG7PV6HCLEMASRJ3YEYRB3AVCNFSM6AAAAAAX7TSVNCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMJUGU4DKNZZGY>. You are receiving this because you modified the open/close state.Message ID: ***@***.***>

Vagabond · 2023-11-16T14:53:02Z

But it also occurs with our chatterbox's client?

tsloughter · 2023-11-16T15:20:11Z

Oh, yea, I recreated with a chatterbox test.

tsloughter · 2023-11-16T16:18:18Z

NewSWS is 32707 and BytesSent is 32828 (the size of the whole body).

tsloughter · 2023-11-16T23:48:13Z

Seems like its updating the send window and then just never doing anything again.

tsloughter · 2023-11-17T10:16:10Z

So it decides it can send everything but then finds if it did send everything (it hasn't actually sent, just decided to) it would be more than allowed?

tsloughter · 2023-11-17T10:30:30Z

Wait, 32707 is the send window size 65535 minus the body 32828. So is it comparing the updated window as though it had been sent against what is to be sent?

tsloughter · 2023-11-17T10:37:33Z

Seems like double counting, right?

Should it instead of checking the current socket send window against BytesSent? Or is it meant to check if new send window would be less than 0 then it should not send?

If either of those is the case, I'm then still not sure how to test the case that it does actually end up in a state that it should go into the we delved too deep, and too greedily branch and possibly fix why it isn't eventually sending everything.

Vagabond · 2023-11-17T14:44:01Z

Yeah, this code does look wrong. I wonder if it should be case BytesSent > SWS of instead?

Vagabond · 2023-11-17T14:45:45Z

The intent here, is because each stream can be handled independently, is to check that there's not a race condition where we'd send more than we were allowed by the windowing system. Given there's only one stream active this code is clearly wrong somewhere as there's no other stream to race against.

Vagabond · 2023-11-17T14:48:44Z

Or possibly we can simply check NewSWS is not < 0?

tsloughter · 2023-11-17T15:33:39Z

Right, my thought was NewSWS < 0 and that is what I have right now and it works. Would like a test to verify in the case this branch is triggered it does end up sending the data but not sure how to actually trigger it :).

tsloughter · 2023-11-17T15:47:01Z

Ah crap, turns out with NewSWS < 0 was only passing in the case my other fiddling was still done where I turned up frame sizes by a factor of 10, ugh.

tsloughter · 2023-11-17T17:59:55Z

nevermind, just didnt reset everyting. it works

Vagabond · 2023-11-17T18:18:50Z

so that fixes it, or?

andymck and others added 30 commits March 31, 2022 10:32

make potential crash a lil less noisy

fda49de

ensure trailers are handled whilst in close state

8acc65b

add rst_stream api and make it a cast

Merge pull request #1 from novalabsxyz/andymck/fix-trailers-close-rac…

cbfe6e4

…e-condition Andymck/fix trailers close race condition

Merge remote-tracking branch 'tsloughter/master'

8c92679

Rework stream_set to be an ETS table

afcbbe6

Move socket recv into its own process

ce02231

More work on sending less messages & prepare for faster unary

9a176c1

More unary optimizations

49ec6d7

WIP to send stream data directly

4cf46ea

Switch receive window to an atomic counter

a3b76be

Get direct sending working properly

2347101

Hoist a bunch more frame handling out of connection process

1e8ce36

Make stream callback init not block the connection

23a7e3f

Remove queue type for continuation frames

53dd5c6

Make client header/continuation handling bypass connection process

0ed5790

Cleanup some ct:s

37e6ef9

Make recv window updates directly

53443a7

Make stream_finished not a message to the connection process

6bd8a72

Try to imitate original socket close/error/go_away behaviour

c6de7d1

Fix PING frame streamid 0 check

07c33fc

Try to be more tolerant of stray messages

d72dcad

Don't crash streams on timeout if the streamset is gone

b030dde

Annotate goaway frames with debug data

abd6150

More goaway annotations

a107c15

More decoupling from h2_connection

7e8d63f

Wild hacking

449d848

Ignore max frame violations for now

8935d09

Debugging

d3886cb

More debug

14bd918

De

8fcb5fe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance rework #22

Performance rework #22

Vagabond commented May 12, 2023

tsloughter commented Nov 13, 2023

Vagabond commented Nov 13, 2023

tsloughter commented Nov 13, 2023

Vagabond commented Nov 13, 2023

tsloughter commented Nov 13, 2023 •

edited

Loading

tsloughter commented Nov 14, 2023

tsloughter commented Nov 15, 2023

tsloughter commented Nov 15, 2023

tsloughter commented Nov 16, 2023

tsloughter commented Nov 16, 2023

tsloughter commented Nov 16, 2023

tsloughter commented Nov 16, 2023

Vagabond commented Nov 16, 2023

tsloughter commented Nov 16, 2023

Vagabond commented Nov 16, 2023

tsloughter commented Nov 16, 2023 via email

Vagabond commented Nov 16, 2023

tsloughter commented Nov 16, 2023

tsloughter commented Nov 16, 2023

tsloughter commented Nov 16, 2023

tsloughter commented Nov 17, 2023

tsloughter commented Nov 17, 2023

tsloughter commented Nov 17, 2023

Vagabond commented Nov 17, 2023

Vagabond commented Nov 17, 2023

Vagabond commented Nov 17, 2023

tsloughter commented Nov 17, 2023

tsloughter commented Nov 17, 2023

tsloughter commented Nov 17, 2023

Vagabond commented Nov 17, 2023

Performance rework #22

Performance rework #22

Conversation

Vagabond commented May 12, 2023

tsloughter commented Nov 13, 2023

Vagabond commented Nov 13, 2023

tsloughter commented Nov 13, 2023

Vagabond commented Nov 13, 2023

tsloughter commented Nov 13, 2023 • edited Loading

tsloughter commented Nov 14, 2023

tsloughter commented Nov 15, 2023

tsloughter commented Nov 15, 2023

tsloughter commented Nov 16, 2023

tsloughter commented Nov 16, 2023

tsloughter commented Nov 16, 2023

tsloughter commented Nov 16, 2023

Vagabond commented Nov 16, 2023

tsloughter commented Nov 16, 2023

Vagabond commented Nov 16, 2023

tsloughter commented Nov 16, 2023 via email

Vagabond commented Nov 16, 2023

tsloughter commented Nov 16, 2023

tsloughter commented Nov 16, 2023

tsloughter commented Nov 16, 2023

tsloughter commented Nov 17, 2023

tsloughter commented Nov 17, 2023

tsloughter commented Nov 17, 2023

Vagabond commented Nov 17, 2023

Vagabond commented Nov 17, 2023

Vagabond commented Nov 17, 2023

tsloughter commented Nov 17, 2023

tsloughter commented Nov 17, 2023

tsloughter commented Nov 17, 2023

Vagabond commented Nov 17, 2023

tsloughter commented Nov 13, 2023 •

edited

Loading