Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: Transport race condition by Content-Length == 0 response #41600

Closed
akihiro opened this issue Sep 24, 2020 · 20 comments
Closed

net/http: Transport race condition by Content-Length == 0 response #41600

akihiro opened this issue Sep 24, 2020 · 20 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@akihiro
Copy link

akihiro commented Sep 24, 2020

What version of Go are you using (go version)?

$ go version
go version go1.15.2 linux/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/hiroaki-m/.cache/go-build"
GOENV="/home/hiroaki-m/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/hiroaki-m/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/hiroaki-m/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build121864325=/tmp/go-build -gno-record-gcc-switches"

What did you do?

package main

import (
	"context"
	"fmt"
	"net/http"
	"net/http/httptest"
)

type handler struct{}

func (h handler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
	w.Header().Add("Content-Length", "0")
}

func requestAndCancel(url string) {
	for {
		wait := make(chan struct{})
		done := make(chan struct{})
		ctx, cancel := context.WithCancel(context.Background())
		req, _ := http.NewRequestWithContext(ctx, http.MethodGet, url, nil)
		go func() {
			close(wait)
			http.DefaultTransport.RoundTrip(req)
			close(done)
		}()
		<-wait
		cancel()
		<-done
	}
}

func main() {
	ts := httptest.NewServer(&handler{})
	defer ts.Close()

	http.DefaultTransport.(*http.Transport).MaxConnsPerHost = 1

	// request and cancel
	for i := 0; i < 8; i++ {
		go requestAndCancel(ts.URL)
	}

	// request only
	for {
		req, _ := http.NewRequest(http.MethodGet, ts.URL, nil)
		if _, err := http.DefaultTransport.RoundTrip(req); err != nil {
			fmt.Printf("An unexpected error has occurred: %#v\n", err)
			break
		}
	}
}

I'm gussing that the keep-alived tcp connection is used simultaneously by multiple goroutine.
Early return to connection pool https://go.googlesource.com/go/+/refs/tags/go1.15.2/src/net/http/transport.go#2089

What did you expect to see?

No error response and loop infinity.

What did you see instead?

$ date -u; go run context_cancel_race.go ; date -u
Thu 24 Sep 2020 03:28:55 AM UTC
An unexpected error has occurred: &errors.errorString{s:"context canceled"}
Thu 24 Sep 2020 03:30:00 AM UTC
@akihiro akihiro changed the title net/http: Transport race condition by Context-Length == 0 response net/http: Transport race condition by Content-Length == 0 response Sep 24, 2020
@fraenkel
Copy link
Contributor

Take a look at how many connections you have in TIME_WAIT when it fails.

@davecheney
Copy link
Contributor

davecheney commented Sep 25, 2020

Looking at your logic there is nothing to prevent this sequence

                // step 1
		go func() {
                         // step 4
			close(wait)
                         // step 5, req is already cancelled.
			http.DefaultTransport.RoundTrip(req)
			close(done)
		}()
                // step 2
		<-wait
                // step 3
		cancel()
		<-done

@akihiro
Copy link
Author

akihiro commented Sep 25, 2020

Take a look at how many connections you have in TIME_WAIT when it fails.

$ go run context-test.go; ss -atn > case1.log
An unexpected error has occurred: &errors.errorString{s:"context canceled"}
$ grep TIME-WAIT case1.log | wc -l
4096

I set tcp_max_tw_buckets to 65536 for relax limitation of a number of TIME-WAIT sockets.

# echo 65536  >  /proc/sys/net/ipv4/tcp_max_tw_buckets

I test again.

$ cat /proc/sys/net/ipv4/tcp_fin_timeout
60
$ sleep 120 # wait for transition TIME-WAIT to CLOSED 
$ go run context-test.go; ss -atn > case2-1.log
An unexpected error has occurred: &errors.errorString{s:"context canceled"}
$ grep TIME-WAIT case2-1.log | wc -l
8243
$ sleep 120
$ go run context-test.go; ss -atn > case2-2.log
An unexpected error has occurred: &errors.errorString{s:"context canceled"}
$ grep TIME-WAIT case2-2.log | wc -l
1495
$ sleep 120
$ go run context-test.go; ss -atn > case2-3.log
An unexpected error has occurred: &errors.errorString{s:"context canceled"}
$ grep TIME-WAIT case2-3.log | wc -l
14615
$ sleep 120
$ go run context-test.go; ss -atn > case2-4.log
An unexpected error has occurred: &errors.errorString{s:"context canceled"}
$ grep TIME-WAIT case2-4.log | wc -l
9449

@fraenkel
Copy link
Contributor

Sorry about the wild goose chase. I have narrowed down the issue. Still trying to understand how its happening but I do understand why you get the error.
The connection handed to a request, not just the non-cancelling one is already broken, closed and cancelled. Now the question is why this connection actually exists in the pool.

@fraenkel
Copy link
Contributor

I am going to have to think about this. There doesn't seem to be an easy fix. The goroutine which reads the response, puts the connection back into the pool and then sends the response back via a channel to roundTrip(). However, the roundTrip() is also watching for contexts that are Done() and will cancel the connection. The connection is already in the idleConn pool. Turns out a request will get this connection before the previous one is actually done so the connection is sometimes broken out of the pool and sometimes healthy but then breaks later.

@fraenkel
Copy link
Contributor

Here is a minimal testcase to reproduce the issue:

package http_test

import (
	"context"
	"net/http"
	"net/http/httptest"
	"sync"
	"testing"
)

func Test41600(t *testing.T) {
	ts := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, req *http.Request) {
		w.Header().Add("Content-Length", "0")
	}))
	defer ts.Close()

	client := ts.Client()
	transport := client.Transport.(*http.Transport)
	transport.MaxIdleConns = 1
	transport.MaxConnsPerHost = 1

	var wg sync.WaitGroup

	ctx, cancel := context.WithCancel(context.Background())

	for i := 0; i < 10; i++ {
		wg.Add(1)
		go func() {
			defer wg.Done()
			for ctx.Err() == nil {
				reqctx, reqcancel := context.WithCancel(context.Background())
				go reqcancel()
				req, _ := http.NewRequestWithContext(reqctx, http.MethodGet, ts.URL, nil)
				rsp, err := client.Do(req)
				if err == nil {
					defer rsp.Body.Close()
				}
			}
		}()
	}

	for {
		req, _ := http.NewRequest(http.MethodGet, ts.URL, nil)
		if rsp, err := client.Do(req); err != nil {
			t.Errorf("unexpected: %p %v", req, err)
			break
		} else {
			rsp.Body.Close()
		}
	}

	cancel()
	wg.Wait()
}

@fraenkel
Copy link
Contributor

The problem is easier to solve than I thought. The race is close to what is described but slightly incorrect. This problem can occur with all request but its easiest to reproduce with a HEAD/no response body request.
The readLoop for request 1 will put the connection back into the idle pool. Request 2 will pick it up and cancel it. Request 1's roundTrip will be notified that the connection has closed because it hasn't received the response. Once it puts the connection back, it should no longer care about any connection issue since a response will be coming.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/257818 mentions this issue: net/http: ignore connection closes once done with the connection

@networkimprov
Copy link

cc @odeke-em as possible reviewer :-)

@cagedmantis cagedmantis added the NeedsFix The path to resolution is known, but the work has not been done. label Sep 28, 2020
@cagedmantis cagedmantis added this to the Backlog milestone Sep 28, 2020
@networkimprov
Copy link

@gopherbot please backport

The relevant CL is https://golang.org/cl/257818 - net/http: ignore connection closes once done with the connection

@gopherbot
Copy link
Contributor

Backport issue(s) opened: #42934 (for 1.14), #42935 (for 1.15).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

@fraenkel
Copy link
Contributor

fraenkel commented Dec 3, 2020

There is going to be an additional CL on top of this one to fix this issue.
See #42942

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/274973 mentions this issue: net/http: add connections back that haven't been cancelled

gopherbot pushed a commit that referenced this issue Dec 9, 2020
Issue #41600 fixed the issue when a second request canceled a connection
while the first request was still in roundTrip.
This uncovered a second issue where a request was being canceled (in
roundtrip) but the connection was put back into the idle pool for a
subsequent request.
The fix is the similar except its now in readLoop instead of roundTrip.
A persistent connection is only added back if it successfully removed
the cancel function; otherwise we know the roundTrip has started
cancelRequest.

Fixes #42942

Change-Id: Ia56add20880ccd0c1ab812d380d8628e45f6f44c
Reviewed-on: https://go-review.googlesource.com/c/go/+/274973
Trust: Dmitri Shuralyov <[email protected]>
Trust: Damien Neil <[email protected]>
Reviewed-by: Damien Neil <[email protected]>
@dmitshur dmitshur modified the milestones: Backlog, Go1.16 Jan 26, 2021
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/297909 mentions this issue: [release-branch.go1.15] net/http: ignore connection closes once done with the connection

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/297910 mentions this issue: [release-branch.go1.15] net/http: add connections back that haven't been canceled

gopherbot pushed a commit that referenced this issue Mar 2, 2021
…with the connection

Once the connection is put back into the idle pool, the request should
not take any action if the connection is closed.

For #42935.
Updates #41600.

Change-Id: I5e4ddcdc03cd44f5197ecfbe324638604961de84
Reviewed-on: https://go-review.googlesource.com/c/go/+/257818
Reviewed-by: Brad Fitzpatrick <[email protected]>
Trust: Damien Neil <[email protected]>
(cherry picked from commit 212d385)
Reviewed-on: https://go-review.googlesource.com/c/go/+/297909
Trust: Dmitri Shuralyov <[email protected]>
Run-TryBot: Dmitri Shuralyov <[email protected]>
Reviewed-by: Damien Neil <[email protected]>
gopherbot pushed a commit that referenced this issue Mar 2, 2021
…een canceled

Issue #41600 fixed the issue when a second request canceled a connection
while the first request was still in roundTrip.
This uncovered a second issue where a request was being canceled (in
roundtrip) but the connection was put back into the idle pool for a
subsequent request.
The fix is the similar except its now in readLoop instead of roundTrip.
A persistent connection is only added back if it successfully removed
the cancel function; otherwise we know the roundTrip has started
cancelRequest.

Fixes #42935.
Updates #42942.

Change-Id: Ia56add20880ccd0c1ab812d380d8628e45f6f44c
Reviewed-on: https://go-review.googlesource.com/c/go/+/274973
Trust: Dmitri Shuralyov <[email protected]>
Trust: Damien Neil <[email protected]>
Reviewed-by: Damien Neil <[email protected]>
(cherry picked from commit 854a2f8)
Reviewed-on: https://go-review.googlesource.com/c/go/+/297910
Run-TryBot: Dmitri Shuralyov <[email protected]>
TryBot-Result: Go Bot <[email protected]>
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/339593 mentions this issue: net/http: set socket linger time to 0 in TestCancelRequestWhenSharingConnection

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/339673 mentions this issue: [release-branch.go1.15] net/http: set socket linger time to 0 in TestCancelRequestWhenSharingConnection

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/339594 mentions this issue: net/http: speed up and deflake TestCancelRequestWhenSharingConnection

gopherbot pushed a commit that referenced this issue Aug 4, 2021
This test made many requests over the same connection for 10
seconds, trusting that this will exercise the request cancelation
race from #41600.

Change the test to exhibit the specific race in a targeted fashion
with only two requests.

Updates #41600.
Updates #47016.

Change-Id: If99c9b9331ff645f6bb67fe9fb79b8aab8784710
Reviewed-on: https://go-review.googlesource.com/c/go/+/339594
Trust: Damien Neil <[email protected]>
Run-TryBot: Damien Neil <[email protected]>
TryBot-Result: Go Bot <[email protected]>
Reviewed-by: Heschi Kreinick <[email protected]>
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/339829 mentions this issue: [release-branch.go1.15] net/http: speed up and deflake TestCancelRequestWhenSharingConnection

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/339830 mentions this issue: [release-branch.go1.16] net/http: speed up and deflake TestCancelRequestWhenSharingConnection

gopherbot pushed a commit that referenced this issue Aug 4, 2021
…estWhenSharingConnection

This test made many requests over the same connection for 10
seconds, trusting that this will exercise the request cancelation
race from #41600.

Change the test to exhibit the specific race in a targeted fashion
with only two requests.

Fixes #47534.
Updates #41600.
Updates #47016.

Change-Id: If99c9b9331ff645f6bb67fe9fb79b8aab8784710
Reviewed-on: https://go-review.googlesource.com/c/go/+/339594
Trust: Damien Neil <[email protected]>
Run-TryBot: Damien Neil <[email protected]>
TryBot-Result: Go Bot <[email protected]>
Reviewed-by: Heschi Kreinick <[email protected]>
(cherry picked from commit 6e73886)
Reviewed-on: https://go-review.googlesource.com/c/go/+/339829
gopherbot pushed a commit that referenced this issue Aug 4, 2021
…estWhenSharingConnection

This test made many requests over the same connection for 10
seconds, trusting that this will exercise the request cancelation
race from #41600.

Change the test to exhibit the specific race in a targeted fashion
with only two requests.

Fixes #47535.
Updates #41600.
Updates #47016.

Change-Id: If99c9b9331ff645f6bb67fe9fb79b8aab8784710
Reviewed-on: https://go-review.googlesource.com/c/go/+/339594
Trust: Damien Neil <[email protected]>
Run-TryBot: Damien Neil <[email protected]>
TryBot-Result: Go Bot <[email protected]>
Reviewed-by: Heschi Kreinick <[email protected]>
(cherry picked from commit 6e73886)
Reviewed-on: https://go-review.googlesource.com/c/go/+/339830
@golang golang locked and limited conversation to collaborators Aug 4, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

7 participants