Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HttpClient.SendAsync(...) may fail to return response that's returned by server prior to completion of sending full request. #57186

Closed
LBensman opened this issue Aug 11, 2021 · 18 comments

Comments

@LBensman
Copy link

Description

HttpClient.SendAsync(...), and by extension all derivative family of calls, incorrectly assume that the server can only issue response upon completion and full consumption of all of request's content. While in most cases servers do, indeed, tend to issue response only after having fully read in and processed the request, this is not a requirement, and does not hold in all the cases; in some cases they can and do issue responses without having to fully consumed the request. In such latter cases, HttpClient fails with quirky internal exceptions (e.g. socket closed exception with EPIPE), and prevents calling code from examining the received response.

Details

The current case is with uploading a large (>100MB) content to AWS S3 bucket, using pre-signed URL via PUT HTTP Verb.

AWS S3 pre-signed URLs are generated ahead of time, with various specifications of conditions about how it can be used, from permissions, to what headers must be specified when accessing it. If subsequent request to generated URL doesn't match specifications stated during its creation, AWS S3 will reject the call with proper RESTful response.

AWS S3 in such cases does not read the full request before beginning its validation logic. It doesn't need to because headers are sufficient to determine if the request is valid. If request is invalid, there is no point to waste time and bandwidth reading the content of PUT operation, as it will be discarded and rejected. And in such cases where request is rejected after examining request headers, it will issue HTTP response stating the error and hard-close the connection.

When server hard-closes the connection while HttpClient is writing request content, it encounters an error on underlying socket, fails to understand the nature of the error, and more importantly, it fails to attempt to read received response that is pending in network buffer to understand that there's a valid response that it can process. E.g. it does:

System.Net.Http.HttpRequestException: Error while copying content to a stream.
 ---> System.IO.IOException: Unable to read data from the transport connection: Broken pipe.
 ---> System.Net.Sockets.SocketException (32): Broken pipe
   --- End of inner exception stack trace ---
   at System.Net.Security.SslStream.<WriteSingleChunk>g__CompleteAsync|210_1[TWriteAdapter](ValueTask writeTask, Byte[] bufferToReturn)
   at System.Net.Security.SslStream.WriteAsyncChunked[TWriteAdapter](TWriteAdapter writeAdapter, ReadOnlyMemory`1 buffer)
   at System.Net.Security.SslStream.WriteAsyncInternal[TWriteAdapter](TWriteAdapter writeAdapter, ReadOnlyMemory`1 buffer)
   at System.Net.Http.HttpConnection.WriteAsync(ReadOnlyMemory`1 source)
   at System.IO.Stream.CopyToAsyncInternal(Stream destination, Int32 bufferSize, CancellationToken cancellationToken)
   at System.Net.Http.HttpContent.CopyToAsyncCore(ValueTask copyTask)
   --- End of inner exception stack trace ---
   at System.Net.Http.HttpContent.CopyToAsyncCore(ValueTask copyTask)
   at System.Net.Http.HttpConnection.SendRequestContentAsync(HttpRequestMessage request, HttpContentWriteStream stream, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.SendWithNtConnectionAuthAsync(HttpConnection connection, HttpRequestMessage request, Boolean doRequestAuth, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.SendWithRetryAsync(HttpRequestMessage request, Boolean doRequestAuth, CancellationToken cancellationToken)
   at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpClient.FinishSendAsyncBuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)
   at MobileLabs.DeviceConnect.Framework.Http.HttpClient.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)

This is incorrect behavior. When HttpClient experiences request write error, it should assume that all is lost, and instead should still attempt to read and parse receive buffer to see if it contains a full and valid response. If it does, it should return it as if normal operation occurred without hiccups; if it doesn't, then it should throw exception indicating abnormal operation during request write.

Suggested repro steps.

Since proper demonstration would require web server with specific behavior endpoint, the sample code below is limited to client only. You can use aforementioned AWS S3 for such setup and craft proper pre-signed URL which would fail to accept upload request, or any other endpoint that would read headers but not content, and issue proper and valid response rejecting the request (and close the socket prior to full upload completion).

The following client code was used that resulted in the above exception:

var request = new HttpRequestMessage(HttpMethod.Put, uploadUri)
{
    Content = new StreamContent(largeVideoStreamOfHundredsMegabytes)
};

request.Content.Headers.ContentType = new System.Net.Http.Headers.MediaTypeHeaderValue("video/mp4");

var uploadTimeout = TimeSpan.FromMinutes(Timeout);
using var uploadTimeoutToken = new CancellationTokenSource(uploadTimeout);
using var response = await Framework.Http.HttpClient.Instance.SendAsync(request, uploadTimeoutToken.Token);

Configuration

.NET Core 3.x (I don't think it really matters)
macOS 10.15.6 x86
Probably not specific to any configuration or platform. It's how HttpClient is coded.

Regression?

Nah. Rather, it's an API design issue that doesn't properly foresee such scenario.

Other information

Problem, as hinted above, is API assuming that request will be read fully before issuing response. Server can issue response without fully reading request, or even. issue and stream response as it is reading request (e.g. media conversion type endpoint that converts media as it reads and produces response output, without storing interim - i.e. converts on-the-fly and writes result on-the-fly back to client.

@dotnet-issue-labeler dotnet-issue-labeler bot added area-System.Net.Http untriaged New issue has not been triaged by the area owner labels Aug 11, 2021
@ghost
Copy link

ghost commented Aug 11, 2021

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

HttpClient.SendAsync(...), and by extension all derivative family of calls, incorrectly assume that the server can only issue response upon completion and full consumption of all of request's content. While in most cases servers do, indeed, tend to issue response only after having fully read in and processed the request, this is not a requirement, and does not hold in all the cases; in some cases they can and do issue responses without having to fully consumed the request. In such latter cases, HttpClient fails with quirky internal exceptions (e.g. socket closed exception with EPIPE), and prevents calling code from examining the received response.

Details

The current case is with uploading a large (>100MB) content to AWS S3 bucket, using pre-signed URL via PUT HTTP Verb.

AWS S3 pre-signed URLs are generated ahead of time, with various specifications of conditions about how it can be used, from permissions, to what headers must be specified when accessing it. If subsequent request to generated URL doesn't match specifications stated during its creation, AWS S3 will reject the call with proper RESTful response.

AWS S3 in such cases does not read the full request before beginning its validation logic. It doesn't need to because headers are sufficient to determine if the request is valid. If request is invalid, there is no point to waste time and bandwidth reading the content of PUT operation, as it will be discarded and rejected. And in such cases where request is rejected after examining request headers, it will issue HTTP response stating the error and hard-close the connection.

When server hard-closes the connection while HttpClient is writing request content, it encounters an error on underlying socket, fails to understand the nature of the error, and more importantly, it fails to attempt to read received response that is pending in network buffer to understand that there's a valid response that it can process. E.g. it does:

System.Net.Http.HttpRequestException: Error while copying content to a stream.
 ---> System.IO.IOException: Unable to read data from the transport connection: Broken pipe.
 ---> System.Net.Sockets.SocketException (32): Broken pipe
   --- End of inner exception stack trace ---
   at System.Net.Security.SslStream.<WriteSingleChunk>g__CompleteAsync|210_1[TWriteAdapter](ValueTask writeTask, Byte[] bufferToReturn)
   at System.Net.Security.SslStream.WriteAsyncChunked[TWriteAdapter](TWriteAdapter writeAdapter, ReadOnlyMemory`1 buffer)
   at System.Net.Security.SslStream.WriteAsyncInternal[TWriteAdapter](TWriteAdapter writeAdapter, ReadOnlyMemory`1 buffer)
   at System.Net.Http.HttpConnection.WriteAsync(ReadOnlyMemory`1 source)
   at System.IO.Stream.CopyToAsyncInternal(Stream destination, Int32 bufferSize, CancellationToken cancellationToken)
   at System.Net.Http.HttpContent.CopyToAsyncCore(ValueTask copyTask)
   --- End of inner exception stack trace ---
   at System.Net.Http.HttpContent.CopyToAsyncCore(ValueTask copyTask)
   at System.Net.Http.HttpConnection.SendRequestContentAsync(HttpRequestMessage request, HttpContentWriteStream stream, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.SendWithNtConnectionAuthAsync(HttpConnection connection, HttpRequestMessage request, Boolean doRequestAuth, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.SendWithRetryAsync(HttpRequestMessage request, Boolean doRequestAuth, CancellationToken cancellationToken)
   at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpClient.FinishSendAsyncBuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)
   at MobileLabs.DeviceConnect.Framework.Http.HttpClient.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)

This is incorrect behavior. When HttpClient experiences request write error, it should assume that all is lost, and instead should still attempt to read and parse receive buffer to see if it contains a full and valid response. If it does, it should return it as if normal operation occurred without hiccups; if it doesn't, then it should throw exception indicating abnormal operation during request write.

Suggested repro steps.

Since proper demonstration would require web server with specific behavior endpoint, the sample code below is limited to client only. You can use aforementioned AWS S3 for such setup and craft proper pre-signed URL which would fail to accept upload request, or any other endpoint that would read headers but not content, and issue proper and valid response rejecting the request (and close the socket prior to full upload completion).

The following client code was used that resulted in the above exception:

var request = new HttpRequestMessage(HttpMethod.Put, uploadUri)
{
    Content = new StreamContent(largeVideoStreamOfHundredsMegabytes)
};

request.Content.Headers.ContentType = new System.Net.Http.Headers.MediaTypeHeaderValue("video/mp4");

var uploadTimeout = TimeSpan.FromMinutes(Timeout);
using var uploadTimeoutToken = new CancellationTokenSource(uploadTimeout);
using var response = await Framework.Http.HttpClient.Instance.SendAsync(request, uploadTimeoutToken.Token);

Configuration

.NET Core 3.x (I don't think it really matters)
macOS 10.15.6 x86
Probably not specific to any configuration or platform. It's how HttpClient is coded.

Regression?

Nah. Rather, it's an API design issue that doesn't properly foresee such scenario.

Other information

Problem, as hinted above, is API assuming that request will be read fully before issuing response. Server can issue response without fully reading request, or even. issue and stream response as it is reading request (e.g. media conversion type endpoint that converts media as it reads and produces response output, without storing interim - i.e. converts on-the-fly and writes result on-the-fly back to client.

Author: LBensman
Assignees: -
Labels:

area-System.Net.Http, untriaged

Milestone: -

@wfurt
Copy link
Member

wfurt commented Aug 11, 2021

I think the claim is not correct @LBensman. HttpClient IMHO does not assume request would be fully sent before getting response. gRPC stream is for example once case when it does not. However, if error happens it will report it back.
I don't think it is safe to make assumptions if some of the operations did not completed.
With 5.0 you can provide you own stream and eat IO exceptions if you want to.

@LBensman
Copy link
Author

@wfurt, how would you then propose that my use of HttpClient in the provided sample be restructured so that I can receive the response and examine it? The response is made available by the return value from SendAsync() call. When exception is thrown, it bubbles up from SendAsync() call via await, and I do not get response from return value. So if claim is not correct, then how do I get access to the response?

@wfurt
Copy link
Member

wfurt commented Aug 11, 2021

Seems the problem is that the server closes connection. The problem is that HttpClient needs to report the failure to send request and there is no good way than throwing at the moment. What you may want eventually is #525
Can you use Expect100Continue to see if the URL is valid?

I don't know if this would solve your problem but you can try to pass in HttpCompletionOption.ResponseHeadersRead to SendAsync() That would finish as soon as you get response headers back. You can grab the response stream and read from it as much as you can.

@LBensman
Copy link
Author

Right, the server closes connection because the server determined that request is rejected before even reading the full, lengthy request. This part is correct.

But I disagree that HttpClient "needs to report the failure to send request" in this particular scenario. It attempts to make full request, but if server sends a valid HTTP response prior to closing the connection, then HttpClient should recognize it as normal request-response transaction, and return the response without throwing exception (it can possibly annotate via some field that in the interim exceptions were observed as additional context data), since it got the response from the server. IOW, it's normal, albeit rare'ish, for servers to return response without fully reading in the request, and HttpClient should handle it as normal transaction, not a failure. But if send operation resulted in the data, and socket read buffer is empty and no response was received, then it's a fail and exception should bubble up to the caller.

I'm not sure if #525 is the right thing. I can sort of see how it would enable handling such scenarios, although it would require much more complicated code than the example there - it would require issuing asynchronous read on the response while then going through the details of sending the request, and then should response come back prior to sending the request (or send parts attempts resulting in the exception), handle it asynchronously and coordinating with possibly still going operation. And that still would depend on #525 actual implementation as it can too close everything and fail to read buffered pending response if it encounters exception on send. That is, implementation there can too fail to read pending data in receive buffers when write exception occurs, and fail to deliver response when one is available. I see $525 as API granularity in parsing the request/response messages, not fix to incorrect handling of "state," which is what the culprit here is.

I looked at HttpCompletionOption and it won't help - that only gets engaged when response is being processed and controls at which point send call terminates (but returns normally, returning the response object). It won't control send call not throwing exception on write when read buffer has pending data.

I'm not sure about ExpectContinue as I don't see much on MSDN about its explanation (I understand the HTTP 100 code - I don't understand how HttpClient will behave with it without some documentation or explanation). I also don't know if AWS S3 supports it or works as we may be expecting it to work here. Can you explain how HttpClient modifies its operation if ExpectContinue is set?

@ManickaP
Copy link
Member

Since .NET 5.0 we have status code on HttpRequestException: https://docs.microsoft.com/en-us/dotnet/api/system.net.http.httprequestexception.statuscode?view=net-5.0#System_Net_Http_HttpRequestException_StatusCode
Would that help?

Anyway, have you looked at other HTTP clients (Java, Go, C++, ...)? How do they behave in this situation? We would need a very strong case to consider such change.

@ManickaP ManickaP modified the milestone: Future Aug 11, 2021
@wfurt
Copy link
Member

wfurt commented Aug 11, 2021

Is the server returning 200 + JSON with error or does it send HTTP error code @LBensman
I'm wondering if you can craft simple unable repro...

Back to the 100:
This was designed for similar situations. The client would send request header and wait for server to respond either with 100 Continue to indicate that client should send the data or it can respond with error code indicating that client should not send data.

Depending on the exact conditions this may work for you assuming the server supports 100Continue sematics and server returning HTTP errno code.

I think it is unlikely we will change the current error handling model. If you really want to ignore request errors you will probably need to use some custom streams.

@wfurt wfurt removed the untriaged New issue has not been triaged by the area owner label Aug 11, 2021
@wfurt wfurt added this to the Future milestone Aug 11, 2021
@geoffkizer
Copy link
Contributor

As @wfurt points out, this is exactly what Expect: 100-continue is intended for.

If you send Expect: 100-continue on the request, then we will wait (up to the expect timeout) to receive either a 100 Continue or a final status code from the server. If we get 100, we will send the request body. If not, we will abort the request and give you back the response we got.

@geoffkizer
Copy link
Contributor

BTW, re this:

When server hard-closes the connection while HttpClient is writing request content, it encounters an error on underlying socket, fails to understand the nature of the error, and more importantly, it fails to attempt to read received response that is pending in network buffer to understand that there's a valid response that it can process.

I'm unclear what you mean by "hard-closes the connection". If the connection is reset, then all data associated with it will be blown away and there's no response for us to read. If you mean something other than connection reset, then what?

@LBensman
Copy link
Author

Since .NET 5.0 we have status code on HttpRequestException: https://docs.microsoft.com/en-us/dotnet/api/system.net.http.httprequestexception.statuscode?view=net-5.0#System_Net_Http_HttpRequestException_StatusCode
Would that help?

@ManickaP, where and how does that property get set, and what source for status code it uses? The doc link is rather skimpy on details, and I'm not clear on usage and what it represents.

Either way, while it may shed some hint on what the server may've replied, often the descriptive information is in the body of the response (as it is with AWS S3 case).

Anyway, have you looked at other HTTP clients (Java, Go, C++, ...)? How do they behave in this situation? We would need a very strong case to consider such change.

I have not looked at other clients, to be honest, and not sure how they would behave, including .NET's WebRequest. But behavior of other clients also seems a bit irrelevant and a bit of a red herring. While it may say that HttpClient is no less deficient than other clients, it is still, nonetheless, is deficient and should not be held to the same limitations as other clients.

The strong case here is that there's data loss that happens when it fails to capture server's response and deliver it to the calling code. The server sent it. Client, therefore, should receive it, no?

@LBensman
Copy link
Author

Is the server returning 200 + JSON with error or does it send HTTP error code @LBensman

Got busy at work and hadn't had a chance to follow up. IIRC, the server returned 4xx range code (I believe it was 400BadRequest, but not certain as my memory may be playing tricks on me.) + JSON with description and explanation for rejection.

I'm wondering if you can craft simple unable repro...

I'll try to extract the code into a minimum repro code. However, since this has been observed with AWS S3, to run it, one would have to have AWS S3 account, possibly with a bucket configured a certain way, in order to be able to run the repro sample. I'll try, but please allow me some time to extract, test, and post example here.

Back to the 100:
This was designed for similar situations. The client would send request header and wait for server to respond either with 100 Continue to indicate that client should send the data or it can respond with error code indicating that client should not send data.

Depending on the exact conditions this may work for you assuming the server supports 100Continue sematics and server returning HTTP errno code.

As I work on the above minimal repro code, I'll experiment with Expect100 facility. I understand and it makes sense from the perspective of a polite client and a willing server (as you said "may work"). And while it may be a better experience, and should be done that way, I would still think that it demands proper behavior in case server is not willing to cooperate?!

I think it is unlikely we will change the current error handling model. If you really want to ignore request errors you will probably need to use some custom streams.

I probably should also note, if I haven't already, that this behavior is a bit racy and depends on the size of the payload of the request -- if invalid request manages to be fully written to server before server rejects and drops connection, HttpClient would read the response body and return the full response, because by that time it will be expecting response and reading the socket, having finished the write operations. Should request body be large enough where HttpClient doesn't finish writing request by the time webserver rejects the request, and HttpClient receives the exception from writing on the socket, then response won't be read. This is roughly non-deterministic behavior with race condition that would leave developers using HttpClient with behavior that are famously very hard to troubleshoot (race conditions are often a pain to diagnose).

I haven't looked deeply into HttClient's code -- though I looked a bit -- but I wonder if all error handling needs to do is in exceptional case to try to read data that may've accumulated in the socket's receive buffer, and attempt to process it. If it contains valid response, then either mute the error and return the response, or throw the exception, but attach response as property on the exception? (I realise I'm being a bit simplistic, but as a skeleton idea.)

@LBensman
Copy link
Author

LBensman commented Sep 11, 2021

BTW, re this:

When server hard-closes the connection while HttpClient is writing request content, it encounters an error on underlying socket, fails to understand the nature of the error, and more importantly, it fails to attempt to read received response that is pending in network buffer to understand that there's a valid response that it can process.

I'm unclear what you mean by "hard-closes the connection". If the connection is reset, then all data associated with it will be blown away and there's no response for us to read. If you mean something other than connection reset, then what?

Hmm, I typed up a response on this part, and then started searching for some backing spec, and couldn't find it 🤣 . So, humbly, perhaps I am wrong and not recalling correctly behavior, though I can't find a spec that would say it definitively one way or another. By "hard-closes the connection" I meant when RST packet is involved, instead of a nice FIN handshake dance. My understanding was that on the receiving side, any data that was already received and sitting in the receive buffer and awaiting delivery to the application via upcoming recv() call was still pending and ready to be delivered even after RST packet was received. Only once pending read buffer is depleted would a call to recv() result in error (EPIPE or ECONNRESET, depending if in-flight data was lost or not).

But perhaps RST does imply that even pending data in receive buffer is skipped?

P.S. Digging a bit more into RFC 793, page 69, it does seem to say that upon arrival of segment with RST, for connection in ESTABLISHED state:

If the RST bit is set then, any outstanding RECEIVEs and SEND should receive "reset" responses. All segment queues should be flushed. Users should also receive an unsolicited general "connection reset" signal.

.. which perhaps is saying that receive and send queues are flushed, and even though pending data could be read, it won't be because RST just completely kills of everything out-of-band? If that's indeed the case, then I'd relent and agree that my report then is indeed incorrect. But then it seems that AWS is sending a reply with an almost certainty that it won't be delivered to the client application because request is still incoming and as soon as it closes the connection, next incoming packet would cause it to issue RST which almost certainly will wipe out its message before it gets to the client app?

@ManickaP
Copy link
Member

ManickaP commented Sep 13, 2021

@ManickaP, where and how does that property get set, and what source for status code it uses? The doc link is rather skimpy on details, and I'm not clear on usage and what it represents.

Unfortunately, it won't help you here, this is the place where it gets set up: https://source.dot.net/#System.Net.Http/System/Net/Http/HttpResponseMessage.cs,172. Meaning only if everything proceeded without an issue and you got non-success status code. Since the server is closing TCP connection on you, the status code won't be set.

I diged a bit into AWS docs (please correct me if I'm looking at the wrong docs), but it seems that they do support 100-continue: https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPOST.html:

To configure your application to send the Request Headers before sending the request body, use the 100-continue HTTP status code. For POST operations, this helps you avoid sending the message body if the message is rejected based on the headers (for example, authentication failure or redirect). For more information on the 100-continue HTTP status code, go to Section 8.2.3 of http://www.ietf.org/rfc/rfc2616.txt.

It seems to me like it's exactly made for your case.

@geoffkizer
Copy link
Contributor

But perhaps RST does imply that even pending data in receive buffer is skipped?

It does. Try it yourself:

            using Socket listen = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp);
            listen.Bind(new IPEndPoint(IPAddress.Loopback, 0));
            listen.Listen();

            using Socket client = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp);
            client.Connect(listen.LocalEndPoint!);
            using Socket server = listen.Accept();

            server.Send(new byte[] { 1, 2, 3 });

            byte[] clientBuffer = new byte[1];
            client.Receive(clientBuffer);
            Console.WriteLine($"Received: {clientBuffer[0]}");

            // This will send RST
            server.Close(0);

            client.Receive(clientBuffer);
            Console.WriteLine($"Received: {clientBuffer[0]}");

This will throw on the second Receive call.

@geoffkizer
Copy link
Contributor

This may also be helpful: https://datatracker.ietf.org/doc/html/rfc7230#section-6.6

@CodingMarkus
Copy link

CodingMarkus commented Sep 16, 2021

@geoffkizer I think you are missing a very important point here: When you make a low level send() call and this call returns successfully, this does not mean that any data has been sent yet. It only means that the data has been accepted, placed into the send buffer of the socket and the kernel will send out the data on the next occasion possible and as fast as the network allows it. If you make a socket send RST to the other side, then the local socket must drop all data still in the send buffer, as that is mandated by the RFC. However the RFC nowhere says that data at the receive buffer on the other side has to be dropped. This also would make little sense, as this data has already been confirmed as being correctly received; but being correctly received doesn't mean the application had any chance to see that data yet, it only means that data was placed in the receive buffer. A sender does not expect that data which has already been acknowledged by the other side as being correctly received to suddenly become "not correctly received" when it issues a TCP RST.

Let me demonstrate that with a very simple sample program. Unfortunately I don't know C# or .NET, so please forgive me for just using plain old C code instead. I will include the full source code below and link to an online resource, where you can directly test that code in your browser.

Let's just quickly explain what the program does:
The program creates a local TCP server socket, connects a local client socket to it, writes 200 integer numbers from client to server, closes the client socket, and finally tries to read as much data as possible from the server socket, so we can see what happens.

There are three interesting pre-processor defines on top:

HAVE_SIN_LEN - This is just to make the code compatible with (Free-)BSD and macOS/iOS. On those systems the define must be set to 1, whereas on Linux it must be set to 0. On Windows... I have no idea but you probably know; also I'm not sure how much of that code is going to compile on Windows in the first place.

CLOSE_WITH_RST - If set to 1, the client will ensure the socket is closed by a RST. If set to 0, the client will close the socket gracefully instead, so it will be closed by a FIN and then linger. It would only close by RST after the linger time has passed but the program won't run long enough for that to happen.

SLEEP_A_BIT - If set to 1, the client will wait an entire second before closing the socket after sending the data. If set to 0, it will close the socket as fast as possible. This holds true no matter if the socket is closed by RST or by FIN. This just adds some delay to give the kernel enough time to perform the actual send operations required to transfer all the data in the send buffer and yes, this does make a difference!

Now here's the output if I disable CLOSE_WITH_RST, no matter if sleeping or not:

Server socket waiting on port 54161.
Client socket connected.
Server accepted client connection.
Wrote 200 integers from 0 to 199 to server.
Client socket closed with TCP RST
Server is reading data now:

0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 

Server has reached the end of the stream.

As you can see, I get all 200 numbers and the stream closes normally. But what happens if I enable CLOSE_WITH_RST and keep SLEEP_A_BIT disabled? Well, the result will vary.

It can be like this (up to 159):

0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, Server: recv failed: Connection reset by peer

Or it can be like this (up to 47):

0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, Server: recv failed: Connection reset by peer

But what happens if I enable SLEEP_A_BIT? Well, then the result will always go up to 199 before I run into an error:

0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, Server: recv failed: Connection reset by peer

And that's because of what I explained at the very top: The RFC demands that after sending a RST, no further data must be sent to the peer, hence all unsent data in the local send buffer will be lost. But no acknowledged data in the receive buffer of the peer will be lost.

Try running that code online: https://onlinegdb.com/8ToVOsWSP
It always fails at 16 for me because there wasn't enough time to transfer more than the first 16 integers before the socket is closed forcefully.

Now let's enabled sleeping: https://onlinegdb.com/aGMZcpt0A
See, it runs through all the way to 199. There's still an error in the end as receiving a RST is an erroneous situation the other side should be aware of, but already sent data won't be lost because of a RST.

You may dispute that setting Linger time to zero will force a RST but it's running the code locally it's easy to prove that this is the case, just keep this open in a terminal window sudo tcpdump -i lo0 tcp

With CLOSE_WITH_RST enabled:

20:27:08.568103 IP localhost.55963 > localhost.55962: Flags [R.], seq 871, ack 1, win 6379, length 0
20:27:08.568121 IP localhost.55962 > localhost.55963: Flags [.], ack 871, win 6366, options [nop,nop,TS val 2538893231 ecr 444660579], length 0
20:27:08.568131 IP localhost.55963 > localhost.55962: Flags [R], seq 3120040667, win 0, length 0

With CLOSE_WITH_RST disabled:

20:28:08.125581 IP localhost.55989 > localhost.55988: Flags [FP.], seq 235:891, ack 1, win 6379, options [nop,nop,TS val 3863493664 ecr 374418739], length 656
20:28:08.125614 IP localhost.55988 > localhost.55989: Flags [.], ack 892, win 6365, options [nop,nop,TS val 374418740 ecr 3863493663], length 0
20:28:08.125965 IP localhost.55988 > localhost.55989: Flags [F.], seq 1, ack 892, win 6365, options [nop,nop,TS val 374418740 ecr 3863493663], length 0
20:28:08.126032 IP localhost.55989 > localhost.55988: Flags [.], ack 2, win 6379, options [nop,nop,TS val 3863493664 ecr 374418740], length 0

R is the RST flag and F is the FIN flag.

Finally, let me append the entire source code below, just in case, as I don't know how long the online service referenced above will keep that code available or how long that service is going to exist. So in case it goes down, here's the code once again:

#include <assert.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include <unistd.h>
#include <arpa/inet.h>
#include <sys/socket.h>

#define HAVE_SIN_LEN   0
#define CLOSE_WITH_RST 0
#define SLEEP_A_BIT    0

// ============================================================================
// SERVER

static short ServerPort = 0;
static int ServerSocket = -1;


static
void initServer ( void )
{
	ServerSocket = socket(PF_INET, SOCK_STREAM, 0);
	if (ServerSocket < 0) {
		fprintf(stderr, "SERVER: socket failed: %s\n", strerror(errno));
		exit(1);
	}

	struct sockaddr_in addr = {
#if HAVE_SIN_LEN
		.sin_len = sizeof(addr),
#endif
		.sin_family = AF_INET,
		.sin_addr = htonl(INADDR_LOOPBACK)
	};
	socklen_t addrLen = sizeof(addr);

	int bindErr = bind(
		ServerSocket, (struct sockaddr *)&addr, addrLen
	);
	if (bindErr) {
		fprintf(stderr, "SERVER: bind failed: %s\n", strerror(errno));
		exit(1);
	}

	int getNameErr = getsockname(
		ServerSocket, (struct sockaddr *)&addr, &addrLen
	);
	if (getNameErr) {
		fprintf(stderr, "SERVER: getsockname failed: %s\n", strerror(errno));
		exit(1);
	}

	int listenErr = listen(ServerSocket, 1);
	if (listenErr) {
		fprintf(stderr, "SERVER: listen failed: %s\n", strerror(errno));
		exit(1);
	}

	ServerPort = ntohs(addr.sin_port);
	printf("Server socket waiting on port %hu.\n", ServerPort);
}



static int ConnectedServerSocket = -1;


static
void acceptClient ( void )
{
	ConnectedServerSocket = accept(ServerSocket, NULL, NULL);
	if (ConnectedServerSocket < 0) {
		fprintf(stderr, "SERVER: accept failed: %s\n", strerror(errno));
		exit(1);
	}
	printf("Server accepted client connection.\n");
}


static
void tryServerRead ( void )
{
	printf("Server is reading data now:\n\n");

	for (;;) {
		char buffer[64];

		ssize_t bytesIn = recv(
			ConnectedServerSocket, &buffer, sizeof(buffer), 0
		);
		if (bytesIn < 0) {
			fprintf(stderr, "Server: recv failed: %s\n", strerror(errno));
			exit(1);
		}
		if (bytesIn == 0) {
			// We reached the end of stream
			printf("\n\n");
			break;
		}
		printf("%.*s", (int)bytesIn, buffer);
		fflush(stdout);
	}

	printf("Server has reached the end of the stream.\n");
}


// ============================================================================
// CLIENT

static int ClientSocket = -1;

static
void initClient ( void )
{
	ClientSocket = socket(PF_INET, SOCK_STREAM, 0);
	if (ClientSocket < 0) {
		fprintf(stderr, "CLIENT: socket failed: %s\n", strerror(errno));
		exit(1);
	}

	struct sockaddr_in addr = {
#if HAVE_SIN_LEN
		.sin_len = sizeof(addr),
#endif
		.sin_family = AF_INET,
		.sin_addr = htonl(INADDR_LOOPBACK),
		.sin_port = htons(ServerPort)
	};
	socklen_t addrLen = sizeof(addr);

	int connectErr = connect(
		ClientSocket, (struct sockaddr *)&addr, addrLen
	);
	if (connectErr) {
		fprintf(stderr, "CLIENT: connect failed: %s\n", strerror(errno));
		exit(1);
	}

	printf("Client socket connected.\n");
}


static
void spamServer ( )
{
	char buffer[6];
	const int max = 199;
	size_t totalCount = 0;
	for (int i = 0; i <= max; i++) {
		int printCount = snprintf(buffer, sizeof(buffer), "%d, ", i);
		assert(printCount < sizeof(buffer));
		int sendErr = send(ClientSocket, buffer, printCount, 0);
		if (sendErr < 0) {
			fprintf(stderr, "CLIENT: send failed: %s\n", strerror(errno));
			exit(1);
		}
		totalCount += printCount;
	}
	printf("Wrote %d integers from 0 to %d to server.\n", max + 1, max);
}


static
void killClient ( void )
{
#if CLOSE_WITH_RST
	// SO_LINGER
	// Disabling linger will only prevent close() from blocking but the system
	// will still linger in the background. Setting linger time to zero however
	// means the linger timeout is hit immediately when close() is called and
	// as a result, the socket is not closed with FIN but with RST.
	struct linger l = {
		.l_onoff = 1,
		.l_linger = 0
	};
	int optErr = setsockopt(
		ClientSocket, SOL_SOCKET, SO_LINGER, &l, sizeof(l)
	);
	if (optErr) {
		fprintf(stderr, "CLIENT: setsockopt failed: %s\n", strerror(errno));
		exit(1);
	}
#endif

#if SLEEP_A_BIT
	sleep(1);
#endif

	int closeErr = close(ClientSocket);
	if (closeErr) {
		fprintf(stderr, "CLIENT: close failed: %s\n", strerror(errno));
		exit(1);
	}

	ClientSocket = -1;
	printf("Client socket closed with TCP RST\n");
}


// ============================================================================
// MAIN

int main ( int argc, const char * const * args )
{
	initServer();
	initClient();
	acceptClient();
	spamServer();
	killClient();
	tryServerRead();
}

@geoffkizer
Copy link
Contributor

geoffkizer commented Sep 17, 2021

That's interesting. This seems to be a platform difference between Linux and Windows. I ran my sample code from above on Linux, and it does indeed receive all sent data before throwing the connection reset exception.

That said, my general point here stands: Once you send a RST you cannot rely on the peer receiving any data you sent before the RST. That's certainly true on Windows, as shown above, and may be true on other platforms as well. It may even be true on Linux in certain situations.

When you make a low level send() call and this call returns successfully, this does not mean that any data has been sent yet. It only means that the data has been accepted, placed into the send buffer of the socket and the kernel will send out the data on the next occasion possible and as fast as the network allows it. If you make a socket send RST to the other side, then the local socket must drop all data still in the send buffer, as that is mandated by the RFC.

Yes, but that's another good reason to avoid sending a RST in this case. The sending application does not know whether the data that it has sent (i.e. placed into the send buffer) has actually been sent and acked by the peer or not. So you cannot safely send a RST, because there could always be unacked data in your own send buffer that would be blown away.

In fact your SLEEP_A_BIT code above demonstrates this: it's only safe to send the RST after a nontrivial delay, because otherwise you might blow away your own send buffer. In your example, the 1 second delay is enough; but in general, over non-loopback, you can't really know when it's okay to actually send the RST.

However the RFC nowhere says that data at the receive buffer on the other side has to be dropped. This also would make little sense, as this data has already been confirmed as being correctly received; but being correctly received doesn't mean the application had any chance to see that data yet, it only means that data was placed in the receive buffer. A sender does not expect that data which has already been acknowledged by the other side as being correctly received to suddenly become "not correctly received" when it issues a TCP RST.

There are good reasons to drop the receive buffer on receiving RST: it allows you to notify the application promptly that the connection has been reset. For example, imagine the server starts sending a large response, and then in the middle of sending the response, a fatal error occurs and the connection is reset. The client is processing data as it comes in, but it may have a large backlog of data in its receive buffer that hasn't been delivered to the application yet. If it continues to deliver this data to the application even after receiving the RST, then the application needs to process all this data before it will be notified of the RST, which is probably just wasted time and CPU. Delivering the RST immediately avoids this problem.

(And because of that, I'm a bit surprised that Linux behaves the way it does...)

@LBensman
Copy link
Author

LBensman commented Jan 7, 2022

Yes, I agree that sending RST should be avoided, but that's an advice to AWS team as they're the one hard-closing the connection, albeit for a good reason as (the other way around) they don't want to receive a potentially very large request when they know full-well they'll be rejecting it anyway, and so they roughly have no choice but to send RST. I think if they really wanted to solve this problem, their solution is probably to require that requests are configured for expectation of 100 (as part of usage of their REST API) to avoid this problem, to force clients to abide by that behavior, but I guess they just leave it out to the clients to figure it out 😸 .

Thanks to everyone who weighed in and wrote excellent demonstration code -- this is much appreciated, and I learned something new (or more like re-learned something I forgot since college days 😹 ). Closing this report as it is indeed incorrect.

@LBensman LBensman closed this as completed Jan 7, 2022
@karelz karelz modified the milestones: Future, 7.0.0 Jan 11, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Feb 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants