-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net/http: Transport race condition by Content-Length == 0 response #41600
Comments
Take a look at how many connections you have in TIME_WAIT when it fails. |
Looking at your logic there is nothing to prevent this sequence
|
$ go run context-test.go; ss -atn > case1.log
An unexpected error has occurred: &errors.errorString{s:"context canceled"}
$ grep TIME-WAIT case1.log | wc -l
4096 I set # echo 65536 > /proc/sys/net/ipv4/tcp_max_tw_buckets I test again. $ cat /proc/sys/net/ipv4/tcp_fin_timeout
60
$ sleep 120 # wait for transition TIME-WAIT to CLOSED
$ go run context-test.go; ss -atn > case2-1.log
An unexpected error has occurred: &errors.errorString{s:"context canceled"}
$ grep TIME-WAIT case2-1.log | wc -l
8243
$ sleep 120
$ go run context-test.go; ss -atn > case2-2.log
An unexpected error has occurred: &errors.errorString{s:"context canceled"}
$ grep TIME-WAIT case2-2.log | wc -l
1495
$ sleep 120
$ go run context-test.go; ss -atn > case2-3.log
An unexpected error has occurred: &errors.errorString{s:"context canceled"}
$ grep TIME-WAIT case2-3.log | wc -l
14615
$ sleep 120
$ go run context-test.go; ss -atn > case2-4.log
An unexpected error has occurred: &errors.errorString{s:"context canceled"}
$ grep TIME-WAIT case2-4.log | wc -l
9449 |
Sorry about the wild goose chase. I have narrowed down the issue. Still trying to understand how its happening but I do understand why you get the error. |
I am going to have to think about this. There doesn't seem to be an easy fix. The goroutine which reads the response, puts the connection back into the pool and then sends the response back via a channel to roundTrip(). However, the roundTrip() is also watching for contexts that are Done() and will cancel the connection. The connection is already in the idleConn pool. Turns out a request will get this connection before the previous one is actually done so the connection is sometimes broken out of the pool and sometimes healthy but then breaks later. |
Here is a minimal testcase to reproduce the issue:
|
The problem is easier to solve than I thought. The race is close to what is described but slightly incorrect. This problem can occur with all request but its easiest to reproduce with a HEAD/no response body request. |
Change https://golang.org/cl/257818 mentions this issue: |
cc @odeke-em as possible reviewer :-) |
@gopherbot please backport The relevant CL is https://golang.org/cl/257818 - |
Backport issue(s) opened: #42934 (for 1.14), #42935 (for 1.15). Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases. |
There is going to be an additional CL on top of this one to fix this issue. |
Change https://golang.org/cl/274973 mentions this issue: |
Issue #41600 fixed the issue when a second request canceled a connection while the first request was still in roundTrip. This uncovered a second issue where a request was being canceled (in roundtrip) but the connection was put back into the idle pool for a subsequent request. The fix is the similar except its now in readLoop instead of roundTrip. A persistent connection is only added back if it successfully removed the cancel function; otherwise we know the roundTrip has started cancelRequest. Fixes #42942 Change-Id: Ia56add20880ccd0c1ab812d380d8628e45f6f44c Reviewed-on: https://go-review.googlesource.com/c/go/+/274973 Trust: Dmitri Shuralyov <[email protected]> Trust: Damien Neil <[email protected]> Reviewed-by: Damien Neil <[email protected]>
Change https://golang.org/cl/297909 mentions this issue: |
Change https://golang.org/cl/297910 mentions this issue: |
…with the connection Once the connection is put back into the idle pool, the request should not take any action if the connection is closed. For #42935. Updates #41600. Change-Id: I5e4ddcdc03cd44f5197ecfbe324638604961de84 Reviewed-on: https://go-review.googlesource.com/c/go/+/257818 Reviewed-by: Brad Fitzpatrick <[email protected]> Trust: Damien Neil <[email protected]> (cherry picked from commit 212d385) Reviewed-on: https://go-review.googlesource.com/c/go/+/297909 Trust: Dmitri Shuralyov <[email protected]> Run-TryBot: Dmitri Shuralyov <[email protected]> Reviewed-by: Damien Neil <[email protected]>
…een canceled Issue #41600 fixed the issue when a second request canceled a connection while the first request was still in roundTrip. This uncovered a second issue where a request was being canceled (in roundtrip) but the connection was put back into the idle pool for a subsequent request. The fix is the similar except its now in readLoop instead of roundTrip. A persistent connection is only added back if it successfully removed the cancel function; otherwise we know the roundTrip has started cancelRequest. Fixes #42935. Updates #42942. Change-Id: Ia56add20880ccd0c1ab812d380d8628e45f6f44c Reviewed-on: https://go-review.googlesource.com/c/go/+/274973 Trust: Dmitri Shuralyov <[email protected]> Trust: Damien Neil <[email protected]> Reviewed-by: Damien Neil <[email protected]> (cherry picked from commit 854a2f8) Reviewed-on: https://go-review.googlesource.com/c/go/+/297910 Run-TryBot: Dmitri Shuralyov <[email protected]> TryBot-Result: Go Bot <[email protected]>
Change https://golang.org/cl/339593 mentions this issue: |
Change https://golang.org/cl/339673 mentions this issue: |
Change https://golang.org/cl/339594 mentions this issue: |
This test made many requests over the same connection for 10 seconds, trusting that this will exercise the request cancelation race from #41600. Change the test to exhibit the specific race in a targeted fashion with only two requests. Updates #41600. Updates #47016. Change-Id: If99c9b9331ff645f6bb67fe9fb79b8aab8784710 Reviewed-on: https://go-review.googlesource.com/c/go/+/339594 Trust: Damien Neil <[email protected]> Run-TryBot: Damien Neil <[email protected]> TryBot-Result: Go Bot <[email protected]> Reviewed-by: Heschi Kreinick <[email protected]>
Change https://golang.org/cl/339829 mentions this issue: |
Change https://golang.org/cl/339830 mentions this issue: |
…estWhenSharingConnection This test made many requests over the same connection for 10 seconds, trusting that this will exercise the request cancelation race from #41600. Change the test to exhibit the specific race in a targeted fashion with only two requests. Fixes #47534. Updates #41600. Updates #47016. Change-Id: If99c9b9331ff645f6bb67fe9fb79b8aab8784710 Reviewed-on: https://go-review.googlesource.com/c/go/+/339594 Trust: Damien Neil <[email protected]> Run-TryBot: Damien Neil <[email protected]> TryBot-Result: Go Bot <[email protected]> Reviewed-by: Heschi Kreinick <[email protected]> (cherry picked from commit 6e73886) Reviewed-on: https://go-review.googlesource.com/c/go/+/339829
…estWhenSharingConnection This test made many requests over the same connection for 10 seconds, trusting that this will exercise the request cancelation race from #41600. Change the test to exhibit the specific race in a targeted fashion with only two requests. Fixes #47535. Updates #41600. Updates #47016. Change-Id: If99c9b9331ff645f6bb67fe9fb79b8aab8784710 Reviewed-on: https://go-review.googlesource.com/c/go/+/339594 Trust: Damien Neil <[email protected]> Run-TryBot: Damien Neil <[email protected]> TryBot-Result: Go Bot <[email protected]> Reviewed-by: Heschi Kreinick <[email protected]> (cherry picked from commit 6e73886) Reviewed-on: https://go-review.googlesource.com/c/go/+/339830
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
I'm gussing that the keep-alived tcp connection is used simultaneously by multiple goroutine.
Early return to connection pool https://go.googlesource.com/go/+/refs/tags/go1.15.2/src/net/http/transport.go#2089
What did you expect to see?
No error response and loop infinity.
What did you see instead?
The text was updated successfully, but these errors were encountered: