Request for read interest when channel is unwritable while `HttpClient.send(Mono)` is used #2864

pderop · 2023-07-25T19:28:44Z

Motivation:

When using HttpClient.send with a Mono publisher and plain POST HTTP/1.1, a problem may occur if the channel becomes unwritable during request flushing. If the remote server sends an early response (e.g., 400 bad request) and closes the connection without reading the request post data, the client might fail with a "Connection prematurely closed BEFORE response" error instead of reporting the actual early server response (e.g., 400 bad request).

This issue arises because, in the case of a Mono publisher and HTTP/1.1 plain, the channel read interest is not enabled when the channel becomes unwritable, preventing the client from reading the response and leading to "Connection prematurely closed BEFORE response" errors.

For example, in GH #2825, the user has provided a reproducer example, where the server (port 8000) sends a 400 bad request to the client which is still writing a large POST HTTP/1.1 request data (the client is often blocked while writing because of TCP flow control). In the following wireshark, the frame 602 is the 400 bad request returned to the user. Notice that the 400 bad request has been sent in two frames (in frame 598, the client first gets the 400 bad request headers in chunk, followed by the body, and in frame 602, we get the last zero-length chunk header that is delimiting the end of the message):

Then in frame 603, the server receives the TCP/ACK highlighted with a [TCP Window Full] info, meaning that the client is using the full capacity of the TCP flow, limited by the server's receive window which is full.

Finally, a TCP/RST is then sent to the client in frame 767, but the client has not requested for read() interest, so it misses the 400 bad request, and then aborts with Connection prematurely closed BEFORE response errors:

By requesting for read interests while writing the big POST data request, we can alleviate the issue and be able to report the 400 bad request to the user instead of misleading Connection prematurely closed BEFORE response

Caution: This PR may alleviate the issue, but sometimes, it may not avoid the problem because TCP/RST is not graceful and some packets may not be handled by the client. For example, if the 400 bad request is sent in chunked encoding and is segmented in 3 frames (1st with the headers, 2nd with the the body, and the 3rd with the last zero-length chunk), then the last 3rd frame with the last chunk EOF may be missed, even if we see it from wireshark on the client machine ...). In this case, we may get a Connection prematurely closed DURING response error on the client. This we sometimes won't be able to avoid it (because of the nature of the TCP/RST).

For example, with curl, we usually always get the 400 bad request, but if we manage to let the server send the 400 bad request in three frames (the headers, the body, and the last chunk), then sometimes curls will fail like this:

curl -v --header "Expect:"  -H "Content-Type: text/plain" -d @sample_request.json http://xxxx.xxxx.xxxx.xxxx:8000/payload-size
> POST /large-payload HTTP/1.1
> Host: xxxx.xxxx.xxxx.xxxx:8000
> User-Agent: curl/7.88.1
> Accept: */*
> Content-Type: text/plain
> Content-Length: 1561039
> 
* Recv failure: Connection reset by peer
* Closing connection 0
curl: (56) Recv failure: Connection reset by peer

But most of the time, curl is able to get the 400 bad request response:

In the context of Tomcat (used by the reproducer, the 400 bad request is sent in two frames: first for the headers+body, second one for the last chunk header.

Modifications:

Requesting for read interest when HttpClient.send method is used with a Mono publisher seems to fix the issue, and we can now get the early 400 bad request server response instead of the unexpected Connection prematurely closed BEFORE response error.

Now, read interest could be systematically enabled, like this in the HttpOperations.send(Publisher) method:

	public NettyOutbound send(Publisher<? extends ByteBuf> source) {
		if (!channel().isActive()) {
			return then(Mono.error(AbortedException.beforeSend()));
		}
		if (source instanceof Mono) {
			return new PostHeadersNettyOutbound(((Mono<ByteBuf>) source)
					.flatMap(b -> {
						if (markSentHeaderAndBody(b)) {
							HttpMessage msg = prepareHttpMessage(b);

							try {
								afterMarkSentHeaders();
							}
							catch (RuntimeException e) {
								ReferenceCountUtil.release(b);
								return Mono.error(e);
							}
->                                                      channel.read();
							return FutureMono.from(channel().writeAndFlush(msg));
						}
						return FutureMono.from(channel().writeAndFlush(b));
					})
					.doOnDiscard(ByteBuf.class, ByteBuf::release), this, null);
		}
		return super.send(source);
	}

Instead of doing a read() systematically, this PR uses a different approach and only requests for channel read() if the channel becomes unwritable: The ChannelOperationsHandler is now overriding the channelWritabilityChanged callback, which delegates to any registered ChannelOperation (there is also an empty default onWritabilityChanged method that has been added in ChannelOperations). So, the onWritabilityChanged is only implemented by the HttpClientOperations class, and when the channel becomes unwritable, it will then invoke channel().read(), but only if the request has been sent using HttpClient.send(Mono<ByteBuf>), and plain HTT/1.1 is used. I do not think it is needed to do the same when HTTPS, H2C, HTTP2, or WebSocket are used.

In order to detect if HttpClient.send(Mono<ByteBuf>) has been used, the PR relies on the HttpOperations.hasSentBody() new method. Please check the important javadoc done on top of HttpClientOperations.onWritabilityChanged()

There is a last issue: in AbstractHttpClientMetricsHandler, since now it's possible that we receive an early response before the corresponding request has been fully flushed, care must be taken, because once channelRead() receives a full response, it will call recordRead(), and after it will call reset(),. So that's a problem if the promise listener of the write method completes later, because it will then call recordWrite() but at this point, all class fields will have already been cleared by the reset() previously called by channelRead().
To handle such problem: some sequence numbers are now used in order to let the channelRead() detect if the full write has not yet completed when a full response is received. In this case channelRead() will call itself recordWrite() on behalf of the write method, and then recordRead() is called, and then reset().

Added two tests:

Added HttpClientWithTomcatTest.testIssue2825 which is reproducing the similar scenario from Misleading reactor.netty.http.client.PrematureCloseException: Connection prematurely closed BEFORE response #2825, where tomcat is returning an early 400 bad request response and closes the connection while the client is writing a large POST http request. The test checks both usage of HttpClient.send(Flux<ByteBuf>) and HttpClient.send(Mono<ByteBuf>)
Added HttpClientTests.testIssue2825 which validates we have no problems when using HTTP 1.1 secure, H2C, or HTTP2

Fixes #2825

pderop · 2023-07-27T09:56:20Z

I don't think the windows test failure is related to this PR.

pderop · 2023-07-27T09:57:47Z

@violetagg ,

When you will be available (this is not urgent), can you take a look ?
thanks.

pderop · 2023-09-01T08:18:25Z

I have updated the HttpClientTest in order to also test HTTP/1.1 plain protocol.

pderop · 2023-09-01T08:53:24Z

@violetagg ,

the checks are failing on windows , let me verify that before reviewing ...

…porarily unwritable.

…ch is also checking HTTP/1.1 protocol

pderop · 2023-09-04T11:27:16Z

rebased on top of 1.0.x, in order to pick up #2892

pderop · 2023-09-11T15:42:19Z

After some research, it appears that the tests from this PR that were unstable so far were using a reactor netty http server. But this is a different use case than the reproducer project provided from #2825, which was based on Tomcat.

When a request is aborted, Tomcat continues to read request body bytes up to two mega bytes once a final response (400) is sent, and this allows the reactor netty client to have more times to be able to see the 400 bad request (using the patch from this PR). See maxSwallowSize in Tomcat documentation.

Now, when using reactor netty http server, the issue is that the connection is closed right after the 400 bad request is sent, and on localhost, the tests may be unstable because TCP/RST may be sent to the client, which then may miss the 400 bad request.

I have removed all reactor netty server based tests, and I have only left the Tomcat test that is using HTTP/1.1 plain, and the tests are now stable.

Turning this PR to ready for review (in the latest checks, I don't think that the errors from the Windows matrix are related).

reactor-netty-core/src/main/java/reactor/netty/channel/ChannelOperations.java

reactor-netty-core/src/main/java/reactor/netty/channel/ChannelOperationsHandler.java

reactor-netty-http/src/main/java/reactor/netty/http/HttpOperations.java

reactor-netty-http/src/main/java/reactor/netty/http/client/HttpClientOperations.java

reactor-netty-http/src/test/java/reactor/netty/http/client/HttpClientWithTomcatTest.java

violetagg · 2023-09-14T09:34:50Z

Please fix the checkstyle warning

> Task :reactor-netty-http:checkstyleMain
/home/runner/work/reactor-netty/reactor-netty/reactor-netty-http/src/test/java/reactor/netty/http/client/HttpClientWithTomcatTest.java:75: warning: [UnnecessaryParentheses] These grouping parentheses are unnecessary; it is unlikely the code will be misinterpreted without them
	private static final byte[] PAYLOAD = String.join("", Collections.nCopies((TomcatServer.PAYLOAD_MAX) + (1024 * 1024), "X"))
	                                                                          ^
    (see https://errorprone.info/bugpattern/UnnecessaryParentheses)
  Did you mean 'private static final byte[] PAYLOAD = String.join("", Collections.nCopies(TomcatServer.PAYLOAD_MAX + (1024 * 1024), "X"))'?

pderop · 2023-09-14T18:58:04Z

Fixed checkstyle warning in 69c0b14

pderop · 2023-09-14T20:43:59Z

I have applied your feedbacks (thanks), can you check ?

pderop · 2023-09-15T06:15:00Z

@violetagg , thanks for the review.

Fixed flaky test in HttpClientWithTomcatTest.testIssue2825. To reproduce the issue, we need Tomcat to close the connection while the client is still writing. This flakiness occurs because Tomcat closes the connection without reading all remaining data. Depending on the unread data’s size, it may result in TCP sending a TCP/RST instead of a FIN. When the client receives TCP/RST, some or all unread data may be dropped. So, the socket send buffer size in HttpClient has been reduced, which eliminated the flakiness of the test and most of TCP/RST. Additionally, returning a 400 bad request without chunk encoding reduces the chance of losing data, as it sends only one TCP segment (compared to two segments with chunk encoding). These workarounds seem to fix the instability of the test, and if the patch is disabled, the PrematureCloseException reliably reoccurs with the test. I also removed the retries, the tests are running in around 1,5-2 seconds. The test for the case when HttpClient sends the request using Flux has been removed, because it seems unstable, and maybe it's a different problem, which must be addressed in a different issue. Related to #2864 #2825

pderop added the type/bug A general bug label Jul 25, 2023

pderop added this to the 1.0.35 milestone Jul 25, 2023

pderop marked this pull request as draft July 25, 2023 19:45

pderop force-pushed the 1.0.x-gh-2825 branch 4 times, most recently from 8275e76 to 168eef3 Compare July 27, 2023 08:45

pderop marked this pull request as ready for review July 27, 2023 09:56

pderop requested a review from violetagg July 27, 2023 09:57

pderop mentioned this pull request Jul 27, 2023

Misleading reactor.netty.http.client.PrematureCloseException: Connection prematurely closed BEFORE response #2825

Closed

violetagg modified the milestones: 1.0.35, 1.0.36 Aug 9, 2023

pderop marked this pull request as draft September 1, 2023 15:53

pderop force-pushed the 1.0.x-gh-2825 branch 5 times, most recently from 490ef0b to 72e565b Compare September 4, 2023 11:19

pderop added 2 commits September 4, 2023 13:24

Fix misleading PrematureCloseException when HttpClient channel is tem…

585cb07

…porarily unwritable.

Renamed HttpClientTest.testIssue2825_H1S_H2C_H2 to testIssue2825, whi…

972b528

…ch is also checking HTTP/1.1 protocol

pderop force-pushed the 1.0.x-gh-2825 branch from 72e565b to f0c8e93 Compare September 4, 2023 11:26

pderop modified the milestones: 1.0.36, 1.0.37 Sep 4, 2023

pderop force-pushed the 1.0.x-gh-2825 branch 2 times, most recently from ca06e10 to 5957147 Compare September 9, 2023 09:25

pderop force-pushed the 1.0.x-gh-2825 branch from 51ec09a to dfad10b Compare September 9, 2023 10:08

pderop added 2 commits September 11, 2023 11:43

Polish: also test the expected 400 response body.

9501360

Polish: use PAYLOAD_MAX constant.

44fe3ad

pderop marked this pull request as ready for review September 11, 2023 15:42

violetagg requested changes Sep 14, 2023

View reviewed changes

pderop added 9 commits September 14, 2023 20:06

Do not fireChannelWritabilityChanged, this is the end of the pipeline.

457a394

Fixed typo and @SInCE version to 1.0.37

14eee21

Fixed typo in comments.

21af63e

Reorder conditions of the test in onWritabilityChanged.

082de38

Removed useless &Nullable in testIssue2825.

711105f

Polish doOnConnected.

4787ffc

Fixed indent in StepVerifier from testIssue2825.

3f1370d

Fixed checkstyle, removed useless import of @nullable.

69c0b14

Fixed @SInCE version to 1.0.37

c7ac97f

violetagg approved these changes Sep 15, 2023

View reviewed changes

pderop merged commit 50b24b3 into reactor:1.0.x Sep 15, 2023

pderop deleted the 1.0.x-gh-2825 branch September 15, 2023 06:45

pderop added a commit that referenced this pull request Sep 15, 2023

Merge #2864 into 1.1.12

c7ba195

pderop added a commit that referenced this pull request Sep 15, 2023

Merge #2864 into 2.0.0-M4

9e5aad7

This was referenced Sep 15, 2023

Avoid NPE in AbstractHttpClientMetricsHandler write listener #2539

Merged

Fix cross-site scripting vulnerability in test #2902

Merged

Fix HttpClientWithTomcatTest flaky test. #2903

Merged

violetagg changed the title ~~Request for read interest when channel is unwritable while HttpClient.send(Mono) is used~~ Request for read interest when channel is unwritable while HttpClient.send(Mono) is used Sep 28, 2023

sullis mentioned this pull request Feb 18, 2024

enable Netty leak detector extension #3064

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request for read interest when channel is unwritable while `HttpClient.send(Mono)` is used #2864

Request for read interest when channel is unwritable while `HttpClient.send(Mono)` is used #2864

pderop commented Jul 25, 2023 •

edited

Loading

pderop commented Jul 27, 2023 •

edited

Loading

pderop commented Jul 27, 2023

pderop commented Sep 1, 2023

pderop commented Sep 1, 2023

pderop commented Sep 4, 2023

pderop commented Sep 11, 2023 •

edited

Loading

violetagg commented Sep 14, 2023

pderop commented Sep 14, 2023

pderop commented Sep 14, 2023

pderop commented Sep 15, 2023

Request for read interest when channel is unwritable while HttpClient.send(Mono) is used #2864

Request for read interest when channel is unwritable while HttpClient.send(Mono) is used #2864

Conversation

pderop commented Jul 25, 2023 • edited Loading

pderop commented Jul 27, 2023 • edited Loading

pderop commented Jul 27, 2023

pderop commented Sep 1, 2023

pderop commented Sep 1, 2023

pderop commented Sep 4, 2023

pderop commented Sep 11, 2023 • edited Loading

violetagg commented Sep 14, 2023

pderop commented Sep 14, 2023

pderop commented Sep 14, 2023

pderop commented Sep 15, 2023

Request for read interest when channel is unwritable while `HttpClient.send(Mono)` is used #2864

Request for read interest when channel is unwritable while `HttpClient.send(Mono)` is used #2864

pderop commented Jul 25, 2023 •

edited

Loading

pderop commented Jul 27, 2023 •

edited

Loading

pderop commented Sep 11, 2023 •

edited

Loading