Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance rework #22

Merged
merged 122 commits into from
Jun 6, 2023
Merged

Conversation

Vagabond
Copy link

This PR is a collection of changes to improve the reliability and performance of chatterbox so that it can be used in a high performance context. There are a lot of changes here (and a lot of commits), which reflects the nature of how the branch was developed. Depending on the preference of the reviewer we could break this branch up into smaller PRs (although for many changes this may prove difficult). Certainly the commit history should be squashed, although I am currently leaving it for reference.

Now, I'll provide some information about the issues we encountered and how I resolved them. We are running a high throughput grpc (using grpcbox) service that uses a lot of persistant streams and unary requests. We would see the h2_connection back up under load with large mailboxes, and we'd also see http/2 connections get closed with no explanation under load.

Inspecting how chatterbox works, I learned that the h2_connection handled almost all of the http/2 message processing and routing, as well as the socket reading. This results in an incast condition where a connection with many streams will put far too much load on the h2_connection process, resulting in an effective stall of the connection.

To fix this, I made the h2_stream_set (the structure that tracks all the streams on the connection) into a record holding an ETS table, a set of atomic counters, the http/2 socket, the pid of the connection, etc. I then provide this stream_set record to all the streams and allowed the streams to make changes to the stream set themselves.

I also moved the socket recv out of the connection process into a dedicated sub process that sits in a tight receive loop reading and dispatching incoming http/2 frames. Almost all frame handling is done in this process, and incoming DATA frames are routed directly to the corresponding stream pid, bypassing the h2_connection process entirely.

Similarly, all the socket sends are done by the process wanting to send, rather than serializing them via the connection process (this relies on the assumption that a tcp/ssl send() is atomically sent and not interleaved with other sends).

Some http/2 protocol design issues have some concurrency limits that have been addressed:

  • Hpack compression information is assumed to be shared by all streams, so I've patched hpack to allow me to detect when new headers are included in a proposed header/trailer, and that process then takes an exclusive lock of the hpack structure, encodes the new headers and sends them on the socket before releasing the hpack back to the connection.
  • Outbound streams are still opened by the connection process, this is because http/2 requires sequential stream opening (if HEADERS are seen for stream 6 before stream 4, stream 4 is considered closed and unusable).
  • SETTINGS updates need to be applied with some care, thus these also are locked while a SETTINGS update is being processed.

Another issue, that we believe was causing our connection drops under load, was that chatterbox would allow streams to be created independent of headers being sent on that stream. Grpcbox used this, and under load I believe that messages to create streams and send headers would be interleaved in the connection's mailbox, leading to headers being sent on a "closed" stream, which is treated as a fatal connection error. I've removed the APIs that allow a stream to be created without headers being sent now, and thus stream creation and headers being sent are now both done atomically.

Stream GC also appeared to be an issue under high stream counts. I was unable to see why the GC was done the way it was and so, in an effort to reduce memory usage, I simply delete streams marked garbage immediately. Everything still seems to work with that change, so I left it in (interestingly there appears to be no way to set garbage_on_end on the server side of a connection).

There are a few unresolved issues:

  • A test fails (http2_spec_8_1_SUITE ==> sends_second_headers_with_no_end_stream). I haven't figured out why.
  • There's a significant amount of system memory being used that I don't fully understand

That said, this changeset allows us to run our grpcbox service at much higher scales, more reliably, with less CPU usage and so we are running this in production currently. I will update this PR if we find other issues under load, or determine the source of the memory usage.

andymck and others added 30 commits March 31, 2022 10:32
add rst_stream api and make it a cast
…e-condition

Andymck/fix trailers close race condition
@tsloughter
Copy link
Owner

Interesting, if I change the req and resp sizes in the tests to small numbers, all tests pass.

So something about "large" payloads (just like 270 kb) is somehow broken.

@Vagabond
Copy link
Author

maybe some of my buffered receive code is broken?

@tsloughter
Copy link
Owner

Ah, that'd be a good place to start. I hope to dig in some today.

@Vagabond
Copy link
Author

Narrowing down the exact size that triggers it might be helpful

@tsloughter
Copy link
Owner

tsloughter commented Nov 13, 2023

32753 bytes which is roughly 2x the configured max frame size.

@tsloughter
Copy link
Owner

I tried increasing the max frame size and that didn't do anything, so likely unrelated.

@tsloughter
Copy link
Owner

Interesting/weird thing I've noticed so far when tracing whats going on, when the request is large it is sending it twice (headers and all, not just the body).

@tsloughter
Copy link
Owner

Meh, no that isn't the case.. I need more sleep.

@tsloughter
Copy link
Owner

Heh, modified the echo test to send a large body and added a print of the frames the client gets back:

Frames = http2c:get_frames(Client, 3),
ct:pal("Frames ~p", [Frames]),
Frames [{{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,64}},
        {{frame_header,4,8,0,3},{window_update,60}},
        {{frame_header,1,1,4,3},{headers,undefined,<<136>>}}]

@tsloughter
Copy link
Owner

Meh, didn't change the window size in the test.

With the window size updated its just:

Frames []

@tsloughter
Copy link
Owner

That too was a misconfigure. Setting the frame size on the client to not be over the size on the server I see the server handling the body and then at least attempting (I see the calls to h2_connection:send_body) to send a response but the client only getting:

Frames [{{frame_header,4,8,0,3},{window_update,16384}},
        {{frame_header,4,8,0,3},{window_update,16384}},
        {{frame_header,4,8,0,3},{window_update,60}},
        {{frame_header,1,1,4,3},{headers,undefined,<<136>>}}]

@tsloughter
Copy link
Owner

@Vagabond ok, maybe getting somewhere. I got it to work by basically forcing it to send (apply the stream actions) here:

case update(StreamId, fun(Stream0) -> StreamFun(StreamFun0(Stream0)) end, Streams) of
        ok ->
            NewSWS = socket_send_window_size(Streams),
            NewSWS;
        {ok, {BytesSent, _OldStream, Actions}} ->
            NewSWS = decrement_socket_send_window(BytesSent, Streams),
            case BytesSent > NewSWS of
                true ->
                    %% we delved too deep, and too greedily
                    %% try to roll things back

                    %% ets:insert(Streams#stream_set.table, StreamFun0(OldStream)),
                    %% SWS = increment_socket_send_window(BytesSent, Streams),
                    %% SWS;

>                    apply_stream_actions(Actions), 
>                    NewSWS;
                false ->
                    %% ok, its now safe to apply these actions
                    apply_stream_actions(Actions),
                    NewSWS
            end
    end.

@Vagabond
Copy link
Author

Oh, hmm. That's interesting. Aren't we violating the spec there? This is saying, due to a stream race, that we can't send as many bytes as we wanted to. How many streams were active on this session? Does the client agree on the window sizes, or do we have an accounting bug?

@tsloughter
Copy link
Owner

There is only 1 active stream.

Do you mean violating the spec with my change? Because, yes, it is just to debug and find where things might be going wrong.

My guess would be an accounting bug. It is attempting to send everything (I see the h2_connection calls) its just not getting sent out on the socket.

@Vagabond
Copy link
Author

Is there a way to see what the client thinks the window size is, or why the server decided it oversent? Maybe you can print BytesSent and NewSWS in the clause there to see how much they disagree?

Can you add a unit test to the test suite for this?

Also, do we know if this bug happens with a different HTTP/2 client?

@tsloughter
Copy link
Owner

tsloughter commented Nov 16, 2023 via email

@Vagabond
Copy link
Author

But it also occurs with our chatterbox's client?

@tsloughter
Copy link
Owner

Oh, yea, I recreated with a chatterbox test.

@tsloughter
Copy link
Owner

NewSWS is 32707 and BytesSent is 32828 (the size of the whole body).

@tsloughter
Copy link
Owner

Seems like its updating the send window and then just never doing anything again.

@tsloughter
Copy link
Owner

So it decides it can send everything but then finds if it did send everything (it hasn't actually sent, just decided to) it would be more than allowed?

@tsloughter
Copy link
Owner

Wait, 32707 is the send window size 65535 minus the body 32828. So is it comparing the updated window as though it had been sent against what is to be sent?

@tsloughter
Copy link
Owner

Seems like double counting, right?

Should it instead of checking the current socket send window against BytesSent? Or is it meant to check if new send window would be less than 0 then it should not send?

If either of those is the case, I'm then still not sure how to test the case that it does actually end up in a state that it should go into the we delved too deep, and too greedily branch and possibly fix why it isn't eventually sending everything.

@Vagabond
Copy link
Author

Yeah, this code does look wrong. I wonder if it should be case BytesSent > SWS of instead?

@Vagabond
Copy link
Author

The intent here, is because each stream can be handled independently, is to check that there's not a race condition where we'd send more than we were allowed by the windowing system. Given there's only one stream active this code is clearly wrong somewhere as there's no other stream to race against.

@Vagabond
Copy link
Author

Or possibly we can simply check NewSWS is not < 0?

@tsloughter
Copy link
Owner

Right, my thought was NewSWS < 0 and that is what I have right now and it works. Would like a test to verify in the case this branch is triggered it does end up sending the data but not sure how to actually trigger it :).

@tsloughter
Copy link
Owner

Ah crap, turns out with NewSWS < 0 was only passing in the case my other fiddling was still done where I turned up frame sizes by a factor of 10, ugh.

@tsloughter
Copy link
Owner

nevermind, just didnt reset everyting. it works

@Vagabond
Copy link
Author

so that fixes it, or?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants