Fix Http 1.1's read-ahead task management #103257

MihaZupan · 2024-06-10T23:34:04Z

Every HTTP/1.1 connection has a ValueTask<int> _readAheadTask field that gets set either:

from PrepareForReuse right before the call to SendAsync, or
by the connection pool every N seconds as part of its scavenging logic that looks for dead connections to clean up (those that have received EOF)

As a result of the latter, if a connection is added to the connection pool at any point (the shared _http11Connections stack), we must assume that the read-ahead may have been set.

When "consuming" connections, we carefully manage who may consume the read-ahead task to ensure that we don't leak unobserved task exceptions.
To achieve this, we're using an int _readAheadTaskStatus flag to signal which thread will observe the result / whether the exception should be suppressed.

The problem with the current state (before this PR) is that a connection may be briefly added to the connection pool and taken out right after to handle a pending request from the request queue.
Because we know that the connection was freshly added and therefore still good, we don't bother calling into PrepareForReuse, which means we weren't updating the _readAheadTaskStatus as expected. This results in hitting a debug assert in HttpConnection.SendAsync that checks that someone claimed ownership of the read-ahead completion before sending the request (#100616).

Why don't we always call PrepareForReuse then?

PrepareForReuse expects its caller to immediately call into SendAsync if the check is successful. This allows it to store the stream.ReadAsync into _readAheadTask directly, without any extra wrapping logic.

runtime/src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpConnection.cs

Line 166 in 0686ce6

_readAheadTask = _stream.ReadAsync(_readBuffer.AvailableMemory);

But at this point in the connection management, we're not 100% certain that we will have a request for the connection yet - we have waiters that may get canceled at any point.
If we did call PrepareForReuse and then had no waiter to signal, we'd have to throw away the connection.

PrepareForReuse also uses different logic based on a bool async argument which we don't yet have at this point.

Why don't we give the connection to the waiter and have the consuming code call PrepareForReuse?

While not super likely, if the connection was already bad, it would mean that we now took a request from the head of the queue and pushed it back to the tail to retry.
It would also potentially mean needing an extra flag to remember if this is the first request on the connection to match how we currently handle new connections that are immediately bad. Without that, we could end up doing more connect retries to a faulty server than we currently do (maybe that'd be fine, but it's not an obvious consequence).

It should be a possible approach, but I found the following simpler to follow:

This PR:

I changed how we handle read-ahead completion ownership, allowing a thread to return its completion ownership:

When we take a connection from the pool, we check if a scavenging read-ahead was started, and if so try to take ownership of its completion.
If we then fail to signal a request waiter, we try to hand back the completion ownership.

If TryOwnScavengingTaskCompletion fails, the read-ahead already completed and any potential exceptions got ignored => the connection is not usable and we try again.
If TryReturnScavengingTaskCompletionOwnership fails, the read-ahead completed between us taking ownership and trying to return it => the connection is not usable and we try again.

dotnet-policy-service · 2024-06-10T23:34:25Z

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

danmoseley · 2024-06-11T00:01:43Z

Not my area but I want to recognize that the change description here is really great 😀.

MihaZupan · 2024-06-11T02:25:08Z

/azp run runtime-libraries-coreclr outerloop

azure-pipelines · 2024-06-11T02:25:21Z

Azure Pipelines successfully started running 1 pipeline(s).

wfurt

LGTM.
Is there chance this can also be related to #31254?
It seems like there may be mix of problems but some looked similar enough to me.

wfurt · 2024-06-20T17:41:56Z

src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpConnection.cs

@@ -133,7 +137,9 @@ public bool PrepareForReuse(bool async)
            // If the read-ahead task is completed, then we've received either EOF or erroneous data the connection, so it's not usable.
            if (ReadAheadTaskHasStarted)


could still this be race condition? e.g. It did not start it but it will before we get to the next statement.

By the time you get to PrepareForReuse, the caller is the only thing with a reference to this connection.

It's possible that the read was already started and is transitioning to Completed right under you, but it can't transition from NotStarted => something else, or from something to => NotStarted.

src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpConnection.cs

MihaZupan · 2024-06-21T17:48:29Z

Is there chance this can also be related to #31254?

I don't think so. #31254 is happening because our HTTP/1.1 logic isn't flowing the cancellation token through to the underlying stream, instead using dispose to force pending operations to abort.

* Fix Http 1.1's read-ahead task management * Fix comment typo

MihaZupan added the area-System.Net.Http label Jun 10, 2024

MihaZupan added this to the 9.0.0 milestone Jun 10, 2024

MihaZupan requested a review from a team June 10, 2024 23:34

MihaZupan self-assigned this Jun 10, 2024

Fix Http 1.1's read-ahead task management

8c21819

MihuBot force-pushed the http-prepareForReuse branch from bf4ef8f to 8c21819 Compare June 15, 2024 16:29

This was referenced Jun 15, 2024

System.IO.Net5Compat.Tests and System.IO.Tests suddenly exiting with error 137 #100558

Open

SIGKILL (OOM?) while running LibraryImportGenerator.Tests w/o actionable log messages or artifacts dotnet/dnceng#2496

Open

wfurt approved these changes Jun 20, 2024

View reviewed changes

MihaZupan commented Jun 21, 2024

View reviewed changes

src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpConnection.cs Outdated Show resolved Hide resolved

Fix comment typo

bc5b471

build-analysis bot mentioned this pull request Jun 21, 2024

Failed USB connection via port 54050, error 61, in tvOS arm64 Release AllSubsets_Mono #82637

Open

MihaZupan merged commit 3526759 into dotnet:main Jun 22, 2024
80 of 83 checks passed

rzikm pushed a commit to rzikm/dotnet-runtime that referenced this pull request Jun 24, 2024

Fix Http 1.1's read-ahead task management (dotnet#103257)

7781475

* Fix Http 1.1's read-ahead task management * Fix comment typo

wfurt mentioned this pull request Jul 2, 2024

HttpClient (using SocketsHttpHandler) may cause unobserved IOException that cannot be caught #104324

Closed

MihaZupan mentioned this pull request Jul 5, 2024

Update readAheadTask asserts in HttpConnection #104496

Merged

github-actions bot locked and limited conversation to collaborators Jul 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Http 1.1's read-ahead task management #103257

Fix Http 1.1's read-ahead task management #103257

MihaZupan commented Jun 10, 2024

dotnet-policy-service bot commented Jun 10, 2024

danmoseley commented Jun 11, 2024

MihaZupan commented Jun 11, 2024

azure-pipelines bot commented Jun 11, 2024

wfurt left a comment

wfurt Jun 20, 2024

MihaZupan Jun 21, 2024

MihaZupan commented Jun 21, 2024

		@@ -133,7 +137,9 @@ public bool PrepareForReuse(bool async)
		// If the read-ahead task is completed, then we've received either EOF or erroneous data the connection, so it's not usable.
		if (ReadAheadTaskHasStarted)

Fix Http 1.1's read-ahead task management #103257

Fix Http 1.1's read-ahead task management #103257

Conversation

MihaZupan commented Jun 10, 2024

dotnet-policy-service bot commented Jun 10, 2024

danmoseley commented Jun 11, 2024

MihaZupan commented Jun 11, 2024

azure-pipelines bot commented Jun 11, 2024

wfurt left a comment

Choose a reason for hiding this comment

wfurt Jun 20, 2024

Choose a reason for hiding this comment

MihaZupan Jun 21, 2024

Choose a reason for hiding this comment

MihaZupan commented Jun 21, 2024