Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: "Test appears to have leaked a Transport" on darwin-amd64-longtest #59526

Closed
prattmic opened this issue Apr 10, 2023 · 11 comments
Closed
Assignees
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. OS-Darwin
Milestone

Comments

@prattmic
Copy link
Member

The new darwin-amd64-longtest builders (#35678) are failing continuously with "Test appears to have leaked a Transport" errors, either in TestTransportPrefersResponseOverWriteError/h1 (5/6 failures) or TestMaxBytesHandler/max_size_100_request_size_1000000/h1 (1/6 failures):

2023-04-10T18:52:58-65ea4c5/darwin-amd64-longtest
2023-04-10T17:30:27-5e93a2c/darwin-amd64-longtest
2023-04-10T17:13:41-63a08e6/darwin-amd64-longtest
2023-04-10T17:07:24-a3f3868/darwin-amd64-longtest
2023-04-10T17:06:55-60140a8/darwin-amd64-longtest
2023-04-10T15:57:38-9efca84/darwin-amd64-longtest

The only other failures I see like this in fetchlogs are from 2020, so too old to be obviously related.

cc @neild @bcmills

@prattmic prattmic added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Apr 10, 2023
@prattmic
Copy link
Member Author

For reference, the bug for the 2020 failures was #42943

@bcmills bcmills changed the title net: "Test appears to have leaked a Transport" on darwin-amd64-longtest net/http: "Test appears to have leaked a Transport" on darwin-amd64-longtest Apr 10, 2023
@bcmills bcmills added this to the Backlog milestone Apr 10, 2023
@prattmic
Copy link
Member Author

This bug easily reproduces on a darwin-amd64-13 gomote, simply by running go test net/http. It also reproduces with filtering (-run TestTransportPrefersResponseOverWriteError/h1).

The full set of failures I get in net/http:

--- FAIL: TestTransportPrefersResponseOverWriteError (6.94s)
    --- FAIL: TestTransportPrefersResponseOverWriteError/h1 (3.85s)
    --- FAIL: TestTransportPrefersResponseOverWriteError/h2 (3.09s)
--- FAIL: TestTransportReuseConnEmptyResponseBody (6.04s)
    --- FAIL: TestTransportReuseConnEmptyResponseBody/h1 (3.04s)
    --- FAIL: TestTransportReuseConnEmptyResponseBody/h2 (3.00s)

(TestMaxBytesHandler suspiciously missing)

@bcmills
Copy link
Contributor

bcmills commented Apr 10, 2023

TestTransportPrefersResponseOverWriteError does seem to be leaky: it fires off a bunch of requests for which the server writes an error response without reading the request body to completion.

For those requests, c.Do(req) will return as soon as the server response arrives, even though the request processing hasn't finished yet.

Perhaps the solution is to wrap the strings.Reader with a Close method that calls wg.Done() on a sync.WaitGroup? Then the test could wait for all of the request bodies to be closed before returning. (That's probably close enough timing-wise to not trigger the leak check.)

@bcmills
Copy link
Contributor

bcmills commented Apr 10, 2023

I'm not sure what's going on with TestTransportReuseConnEmptyResponseBody, since that request doesn't seem to involve writing a request body. 🤔

@bcmills
Copy link
Contributor

bcmills commented Apr 10, 2023

TestMaxBytesHandler has the same issue as TestTransportPrefersResponseOverWriteError: it creates a request that will be rejected by the server before the body has been read, which on the client side leaks the Write goroutine.

It can be fixed in the same way: by adding a Close method to the body that synchronizes with the return from the test function.

@earthboundkid
Copy link
Contributor

I think I wrote TestMaxBytesHandler. As I recall I just copypasta’d another test. Sorry if it’s buggy! 😓

@bcmills
Copy link
Contributor

bcmills commented Apr 12, 2023

FWIW, watchflakes found TestMaxBytesHandler in #57612.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/483895 mentions this issue: net/http: avoid leaking writer goroutines in tests

@bcmills
Copy link
Contributor

bcmills commented Apr 12, 2023

I think failures reported in TestTransportReuseConnEmptyResponseBody are a red herring. As far as I can tell, the goroutines reported as “leaked” there are the ones from TestTransportPrefersResponseOverWriteError.

Probably the leak check should just be disabled after the first report.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/483896 mentions this issue: net/http: only report the first leak of each test run

gopherbot pushed a commit that referenced this issue Apr 12, 2023
We don't have a way to terminate the leaked goroutines, and we can't
wait forever for them to exit (or else we would risk timing out the
test and losing the log line describing what exactly leaked).
So we have reason to believe that they will remain leaked while we run
the next test, and we don't want the goroutines from the first leak to
generate a spurious error when the second test completes.

This also removes a racy Parallel call I added in CL 476036, which was
flagged by the race detector in the duplicate-suppression check.
(I hadn't considered the potential interaction with the leak checker.)

For #59526.
Updates #56421.

Change-Id: Ib1f759f102fb41ece114401680cd728343e58545
Reviewed-on: https://go-review.googlesource.com/c/go/+/483896
TryBot-Result: Gopher Robot <[email protected]>
Reviewed-by: Michael Pratt <[email protected]>
Auto-Submit: Bryan Mills <[email protected]>
Run-TryBot: Bryan Mills <[email protected]>
gopherbot pushed a commit that referenced this issue Apr 12, 2023
In TestTransportPrefersResponseOverWriteError and TestMaxBytesHandler,
the server may respond to an incoming request without ever reading the
request body. The client's Do method will return as soon as the
server's response headers are read, but the Transport will remain
active until it notices that the server has closed the connection,
which may be arbitrarily later.

When the server has closed the connection, it will call the Close
method on the request body (if it has such a method). So we can use
that method to find out when the Transport is close enough to done for
the test to complete without interfering too much with other tests.

For #57612.
For #59526.

Change-Id: Iddc7a3b7b09429113ad76ccc1c090ebc9e1835a1
Reviewed-on: https://go-review.googlesource.com/c/go/+/483895
Run-TryBot: Bryan Mills <[email protected]>
Commit-Queue: Bryan Mills <[email protected]>
Auto-Submit: Bryan Mills <[email protected]>
TryBot-Result: Gopher Robot <[email protected]>
Reviewed-by: Michael Pratt <[email protected]>
@bcmills
Copy link
Contributor

bcmills commented Apr 12, 2023

Fixed in CL 483895.

@bcmills bcmills closed this as completed Apr 12, 2023
@bcmills bcmills modified the milestones: Backlog, Go1.21 Apr 12, 2023
@bcmills bcmills self-assigned this Apr 12, 2023
@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Nov 28, 2023
@golang golang locked and limited conversation to collaborators Nov 27, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. OS-Darwin
Projects
None yet
Development

No branches or pull requests

5 participants