-
Notifications
You must be signed in to change notification settings - Fork 654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request for read interest when channel is unwritable while HttpClient.send(Mono)
is used
#2864
Conversation
8275e76
to
168eef3
Compare
I don't think the windows test failure is related to this PR. |
When you will be available (this is not urgent), can you take a look ? |
I have updated the HttpClientTest in order to also test HTTP/1.1 plain protocol. |
the checks are failing on windows , let me verify that before reviewing ... |
490ef0b
to
72e565b
Compare
…porarily unwritable.
…ch is also checking HTTP/1.1 protocol
72e565b
to
f0c8e93
Compare
rebased on top of 1.0.x, in order to pick up #2892 |
ca06e10
to
5957147
Compare
51ec09a
to
dfad10b
Compare
After some research, it appears that the tests from this PR that were unstable so far were using a reactor netty http server. But this is a different use case than the reproducer project provided from #2825, which was based on Tomcat. When a request is aborted, Tomcat continues to read request body bytes up to two mega bytes once a final response (400) is sent, and this allows the reactor netty client to have more times to be able to see the 400 bad request (using the patch from this PR). See Now, when using reactor netty http server, the issue is that the connection is closed right after the 400 bad request is sent, and on localhost, the tests may be unstable because TCP/RST may be sent to the client, which then may miss the 400 bad request. I have removed all reactor netty server based tests, and I have only left the Tomcat test that is using HTTP/1.1 plain, and the tests are now stable. Turning this PR to ready for review (in the latest checks, I don't think that the errors from the Windows matrix are related). |
reactor-netty-core/src/main/java/reactor/netty/channel/ChannelOperations.java
Outdated
Show resolved
Hide resolved
reactor-netty-core/src/main/java/reactor/netty/channel/ChannelOperationsHandler.java
Outdated
Show resolved
Hide resolved
reactor-netty-http/src/main/java/reactor/netty/http/HttpOperations.java
Outdated
Show resolved
Hide resolved
reactor-netty-http/src/main/java/reactor/netty/http/client/HttpClientOperations.java
Outdated
Show resolved
Hide resolved
reactor-netty-http/src/main/java/reactor/netty/http/client/HttpClientOperations.java
Outdated
Show resolved
Hide resolved
reactor-netty-http/src/test/java/reactor/netty/http/client/HttpClientWithTomcatTest.java
Outdated
Show resolved
Hide resolved
reactor-netty-http/src/test/java/reactor/netty/http/client/HttpClientWithTomcatTest.java
Outdated
Show resolved
Hide resolved
reactor-netty-http/src/test/java/reactor/netty/http/client/HttpClientWithTomcatTest.java
Outdated
Show resolved
Hide resolved
Please fix the
|
Fixed checkstyle warning in 69c0b14 |
I have applied your feedbacks (thanks), can you check ? |
@violetagg , thanks for the review. |
Fixed flaky test in HttpClientWithTomcatTest.testIssue2825. To reproduce the issue, we need Tomcat to close the connection while the client is still writing. This flakiness occurs because Tomcat closes the connection without reading all remaining data. Depending on the unread data’s size, it may result in TCP sending a TCP/RST instead of a FIN. When the client receives TCP/RST, some or all unread data may be dropped. So, the socket send buffer size in HttpClient has been reduced, which eliminated the flakiness of the test and most of TCP/RST. Additionally, returning a 400 bad request without chunk encoding reduces the chance of losing data, as it sends only one TCP segment (compared to two segments with chunk encoding). These workarounds seem to fix the instability of the test, and if the patch is disabled, the PrematureCloseException reliably reoccurs with the test. I also removed the retries, the tests are running in around 1,5-2 seconds. The test for the case when HttpClient sends the request using Flux has been removed, because it seems unstable, and maybe it's a different problem, which must be addressed in a different issue. Related to #2864 #2825
HttpClient.send(Mono)
is used
Motivation:
When using
HttpClient.send
with aMono
publisher andplain POST HTTP/1.1
, a problem may occur if the channel becomes unwritable during request flushing. If the remote server sends an early response (e.g., 400 bad request) and closes the connection without reading the request post data, the client might fail with a "Connection prematurely closed BEFORE response" error instead of reporting the actual early server response (e.g., 400 bad request).This issue arises because, in the case of a Mono publisher and HTTP/1.1 plain, the channel read interest is not enabled when the channel becomes unwritable, preventing the client from reading the response and leading to "Connection prematurely closed BEFORE response" errors.
For example, in GH #2825, the user has provided a reproducer example, where the server (port 8000) sends a 400 bad request to the client which is still writing a large POST HTTP/1.1 request data (the client is often blocked while writing because of TCP flow control). In the following wireshark, the frame 602 is the 400 bad request returned to the user. Notice that the 400 bad request has been sent in two frames (in frame 598, the client first gets the 400 bad request headers in chunk, followed by the body, and in frame 602, we get the last zero-length chunk header that is delimiting the end of the message):
Then in frame 603, the server receives the TCP/ACK highlighted with a
[TCP Window Full]
info, meaning that the client is using the full capacity of the TCP flow, limited by the server's receive window which is full.Finally, a TCP/RST is then sent to the client in frame 767, but the client has not requested for read() interest, so it misses the 400 bad request, and then aborts with
Connection prematurely closed BEFORE response
errors:By requesting for read interests while writing the big POST data request, we can alleviate the issue and be able to report the 400 bad request to the user instead of misleading
Connection prematurely closed BEFORE response
Caution: This PR may alleviate the issue, but sometimes, it may not avoid the problem because TCP/RST is not graceful and some packets may not be handled by the client. For example, if the 400 bad request is sent in chunked encoding and is segmented in 3 frames (1st with the headers, 2nd with the the body, and the 3rd with the last zero-length chunk), then the last 3rd frame with the last chunk EOF may be missed, even if we see it from wireshark on the client machine ...). In this case, we may get a Connection prematurely closed DURING response error on the client. This we sometimes won't be able to avoid it (because of the nature of the TCP/RST).
For example, with curl, we usually always get the 400 bad request, but if we manage to let the server send the 400 bad request in three frames (the headers, the body, and the last chunk), then sometimes curls will fail like this:
But most of the time, curl is able to get the 400 bad request response:
In the context of Tomcat (used by the reproducer, the 400 bad request is sent in two frames: first for the headers+body, second one for the last chunk header.
Modifications:
Requesting for read interest when
HttpClient.send
method is used with aMono
publisher seems to fix the issue, and we can now get the early 400 bad request server response instead of the unexpectedConnection prematurely closed BEFORE response
error.Now, read interest could be systematically enabled, like this in the HttpOperations.send(Publisher) method:
Instead of doing a read() systematically, this PR uses a different approach and only requests for channel
read()
if the channel becomes unwritable: TheChannelOperationsHandler
is now overriding thechannelWritabilityChanged
callback, which delegates to any registeredChannelOperation
(there is also an empty defaultonWritabilityChanged
method that has been added in ChannelOperations). So, theonWritabilityChanged
is only implemented by the HttpClientOperations class, and when the channel becomes unwritable, it will then invokechannel().read()
, but only if the request has been sent usingHttpClient.send(Mono<ByteBuf>)
, andplain HTT/1.1
is used. I do not think it is needed to do the same whenHTTPS
,H2C
,HTTP2
, orWebSocket
are used.In order to detect if
HttpClient.send(Mono<ByteBuf>)
has been used, the PR relies on theHttpOperations.hasSentBody()
new method. Please check the important javadoc done on top ofHttpClientOperations.onWritabilityChanged()
There is a last issue: in
AbstractHttpClientMetricsHandler
, since now it's possible that we receive an early response before the corresponding request has been fully flushed, care must be taken, because oncechannelRead()
receives a full response, it will callrecordRead()
, and after it will call reset(),. So that's a problem if the promise listener of thewrite
method completes later, because it will then callrecordWrite()
but at this point, all class fields will have already been cleared by thereset()
previously called bychannelRead()
.To handle such problem: some sequence numbers are now used in order to let the channelRead() detect if the full write has not yet completed when a full response is received. In this case channelRead() will call itself recordWrite() on behalf of the
write
method, and thenrecordRead()
is called, and thenreset().
Added two tests:
HttpClient.send(Flux<ByteBuf>)
andHttpClient.send(Mono<ByteBuf>)
Fixes #2825