-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concurrency issue with server-side header encoding #45
Comments
Pretty sure I understand the issue but still not the solution. I need to look over the http2 spec. To make sure I understand, the issue seems to be: we encode headers for stream N, creating an updated hpack context which we use as the hpack context when sending headers on stream N+2 resulting in it using table positions, but I can't imagine there is any requirement about sending or handling of frames based on their stream ids -- maybe there is and that is what is missing, but I would expect, what with all the dynamics of streams and frame windows that it is supposed to not fall apart if stream N+2's frames are handled before stream N's. So then the hpack context would have to be per-stream, but that isn't correct... so I'm a bit stumped, hopefully digging through the spec will make it clear. |
Also, based on the fact that the pcap shows the headers as |
Hm, reading into the spec and a spot I could see being buggy is all header frames must be sent without interleaving any other frames to other streams. I don't know that that is what is happening, I don't think the pcap data is pointing to that, but a guess. |
Meh, I am not seeing how this would happen on the server side. It sends all headers without any possible interleaving in Have you come up with any ideas? |
This is a server side issue in this case. The trailers are sent by the server. The order is as in the sample unary request here: https://github.com/grpc/grpc/blob/master/doc/PROTOCOL-HTTP2.md#example. I took some traces from the code now just before hpack:encode() was called in both h2_connection:send_headers_() and h2_connection:handle_event(_, {send_trailers, ...}). I needed to store the trace in an ets table and print it afterwards, since printing when hitting the functions messes the timings so that the issue does not reproduce.
But in the resulting test.pcap.gz, the order was:
So to me it looks like trailers might be encoded before headers of another stream, but still sent out after them. The hpack encoding is per HTTP/2 connection and not per stream as you said. I think in order to fulfil the protocol, HEADERS (regardless of stream) need to be sent out in the order that the hpack encoding has been done, otherwise the dynamic header tables cannot stay in sync between client and server. I am not very familiar yet with the execution flow inside the server, so not currently sure how the trailers get overtaken by headers from other streams in the code. |
I must have misunderstood, you think the issue is related to the trailers and headers sharing some values? As in, the trailers of stream N are encoded, updating the table, and before being sent the headers of stream N+2 are encoded with the updated table, and they share some values so indices are used in the encoded header frames instead of their full values? |
I do see trailers are sent in |
After trying to trace the related chatterbox modules I just added some log messages and see it happening (I'm running your test) but seems so odd... Trailers are sent with the second set using the updated dynamic table:
Then later the headers for stream 1 and 3 are decoded and 2 are undefined, but they are decoded in the h2_connection process meaning the stream 3 decode should be using the table updated when decoding stream 1:
|
I'm logging the dynamic table too now and it is clearly updated. The table when failing to decode the trailers is:
|
Oh,you were saying it was getting values 65 and 66, which don't exist. And now I see:
Is the table after the trailers have been decoded. I was mixing up headers and trailers. So the table has the proper entries for the headers sent as indices, its that their indices differ between the server and client. Probably what you already tried to explain, hehe, but now I see it :) |
Yea, ok, so it encodes the trailers, updating the encoding context in the connection process. But it only enqueues the trailers for sending, it doesn't send them immediately. This must be why the encoding context for the next set of headers has been updated like trailers had been sent but they hadn't necessarily. The headers are sent inline and not by enqueuing them. I guess one or the other has to change. Either trailers need to be sent immediately, inline when encoded, or headers need to be enqueued to the stream set. Not sure which is the right way to go. I would have assumed moving it all to being enqueued but headers have some special rule about sending them all with nothing else in between, if I read the spec correctly... though maybe that is still achievable with the stream_set enqueuing method. |
Worse, the trailers are sent by casting a message to the stream process, so even enqueuing likely still means it is a race. So the only option (with the current design at least) is to send them in the connection process. |
Not so easy since there could still be some body left to send when send_trailers is called... seems this will require more refactoring than I'd hoped. |
Considering just removing index updates for trailers for now... It is the simplest solution and the optimization can return when I do a refactoring soon. |
@psalin let me know if this is acceptable tsloughter/chatterbox#4 and I'll merge/release it and grpcbox. |
I don't think it can be fixed like that, it breaks HTTP/2 interoperability. The client will still decode the trailers and add them to the hpack context and that changes the indexes. And since the indexes work so that the newest one is always put to the first position and the existing ones are moved back, then the indexes sent by the server will refer to the wrong headers. Even if that would be changed not to happen in grpcbox it would still break interoperability with any other GRPC/HTTP2 client. |
Both send_headers and send_trailers are part of the h2_connection gen_statem so at least at that point the order is guaranteed. For headers sock:send() is called directly so they should go out in order. For trailers h2_stream_set:send_what_we_can is() called, which then I guess means they might not be sent yet since we see headers pass them? |
Ugh, so I figured it'd work since chatterbox will always send the whole header, so they aren't looked up in the table by any implementation. But yea, if the index of a header is changed by the insertion of the trailer then it'll break when chatterbox later sends a header by index.. grrr I really don't think there is a way to fix this in the current design without some big hacks. I guess I can try to figure out just how bad those hacks would be. |
Yes, seems this might be an issue that cannot be fixed with a few lines. Basically sending of the trailers and headers which are both HTTP2 HEADERS frames need to be handled in the same place or at least synced somehow so that the order stays. I will try to look at the code too when I can find time. The good thing from grpcbox point of view as that it seems to send the same trailers always (at least if there no errors) so once the trailers have been successfully received by the client once, then all subsequent messages will just reuse the indexes and it seems to work well for the remainder of the HTTP connection. Seems the issue would mostly happen if there is multiple concurrent requests right at the start of the HTTP connection. |
Yea, I think it may be fixable with not too much work but it will come at the cost of performance, I just don't know how much. Basically I'm trying to change it to call the connection process to encode and send the trailers where it would normally be sending the trailers on the socket directly. So instead of encoding the trailers and then adding to frames to be sent it adds them as they are (not as frames, so no encoding) and waits to encode until it is sending them, which has to be done in the connection process to update the encode context. |
Seems I got it. All grpcbox and chatterbox tests passing. It is a little ridiculous -- I added an I haven't traced out all the stream state possibilities but I think it is fairly simple since trailers means the stream is closing (I hope!). Will open a PR this morning. |
Newest attempt: tsloughter/chatterbox#5 |
This looks nice, just deferring header encoding until its about to send. So I guess performance-wise the main difference is that there is an extra step and that trailers are now stored in the stream set as unencoded, I guess that shouldn't have any big impact on performance? |
Yea, I don't think it should have much of an impact on performance. |
Sounds good and looks all good to me! As an unrelated side note, could you add the tag v0.6.0 for ctx? Seems the version 0.6.0 is available in hex but not tagged in the repo and I think we'd need that tag in order to clear licensing on it. |
When handling concurrent requests, the server may send out an HTTP/2 HEADERS frame that refers to headers in the hpack dynamic table that are not yet allocated at the client side. This causes the client to fail to decode the headers and in case of the grpcbox client it causes it to crash. This issue can be reproduced by running the unary concurrency test of PR #41.
The issue here can best be seen from a PCAP dump of the traffic generated by the test. In the attached dump, the first HEADERS sent by the server occurs in packet 324 (stream ID 101). Here the HTTP/2 Header Block Fragment contains the headers in long format as its the first time they are sent. The client decodes 3 dynamic headers and now its hpack dynamic table contains values at positions 62, 63 and 64.
Next in packet 325 the server sends a HEADERS for stream 127. This message has a Header Block Fragment of 88c2c1c0, of which c2,c1 and c0 are referring to the dynamic table positions 66, 65, and 64. Since the client has not received any headers yet for position 66 and 65, it fails to decode the headers. The message sent by the server is invalid.
The same error occurs in packet 327 for the next stream 117.
Finally in packet 328, the server sends the trailers for the original stream 101, these 2 are also in the long format and once the client has decoded them it will have values at positions 62-66. At this point the invalid packets (325 and 327) sent earlier match the dynamic headers they are referring to in a correct way. This likely means that the order that the server has encoded the messages is not guaranteed to be the order they are sent out in. If they were, packet 328 with the trailers would have been sent out before 325 and 327.
fail.pcap.gz
The text was updated successfully, but these errors were encountered: