Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

database/sql: sqlboiler-using project fails with cancellation errors after context/RawBytes CL when many tests run #60932

Closed
zikaeroh opened this issue Jun 22, 2023 · 8 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. release-blocker
Milestone

Comments

@zikaeroh
Copy link
Contributor

zikaeroh commented Jun 22, 2023

What version of Go are you using (go version)?

$ go version
go version devel go1.21-b1f1fb2950 Wed Jun 21 18:32:13 2023 +0000 linux/amd64

Does this issue reproduce with the latest release?

On tip.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/home/zikaeroh/.cache/go-build'
GOENV='/home/zikaeroh/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/zikaeroh/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/zikaeroh/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/home/zikaeroh/sdk/gotip'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='local'
GOTOOLDIR='/home/zikaeroh/sdk/gotip/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='devel go1.21-b1f1fb2950 Wed Jun 21 18:32:13 2023 +0000'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/home/zikaeroh/hortbot/hortbot/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build1489328331=/tmp/go-build -gno-record-gcc-switches'

What did you do?

I really have to apologize, I don't yet have a reproduction that's more minimal than my actual project. Testing this "only" requires docker running locally.

What did you expect to see?

Tests pass.

What did you see instead?

After CL 497675 (I bisected to be sure), my tests appear to fail, with sqlboiler calls returning errors that mention context cancellation. Something like:

context canceled
error from rows in bind
github.com/volatiletech/sqlboiler/v4/queries.(*Query).Bind
	/home/zikaeroh/go/pkg/mod/github.com/volatiletech/sqlboiler/[email protected]/queries/reflect.go:154
github.com/hortbot/hortbot/internal/bot.handleSession
	/home/zikaeroh/hortbot/hortbot/internal/bot/handle.go:367
github.com/hortbot/hortbot/internal/bot.(*Bot).handlePrivMsg.func1
	/home/zikaeroh/hortbot/hortbot/internal/bot/handle.go:166
github.com/hortbot/hortbot/internal/pkg/dbx.Transact.func2
	/home/zikaeroh/hortbot/hortbot/internal/pkg/dbx/dbx.go:58
github.com/hortbot/hortbot/internal/pkg/dbx.Transact
	/home/zikaeroh/hortbot/hortbot/internal/pkg/dbx/dbx.go:63
github.com/hortbot/hortbot/internal/bot.(*Bot).handlePrivMsg
	/home/zikaeroh/hortbot/hortbot/internal/bot/handle.go:160
github.com/hortbot/hortbot/internal/bot.(*Bot).handle
	/home/zikaeroh/hortbot/hortbot/internal/bot/handle.go:103
github.com/hortbot/hortbot/internal/bot.(*Bot).Handle
	/home/zikaeroh/hortbot/hortbot/internal/bot/handle.go:70
github.com/hortbot/hortbot/internal/bot/btest.(*scriptTester).handle.(*scriptTester).handleM.func3
	/home/zikaeroh/hortbot/hortbot/internal/bot/btest/irc.go:57
github.com/hortbot/hortbot/internal/bot/btest.(*scriptTester).test
	/home/zikaeroh/hortbot/hortbot/internal/bot/btest/script_tester.go:207
github.com/hortbot/hortbot/internal/bot/btest.RunScript
	/home/zikaeroh/hortbot/hortbot/internal/bot/btest/script_tester.go:49
github.com/hortbot/hortbot/internal/bot_test.TestScripts.func1
	/home/zikaeroh/hortbot/hortbot/internal/bot/script_test.go:43
testing.tRunner
	/home/zikaeroh/go/src/testing/testing.go:1595
runtime.goexit
	/home/zikaeroh/go/src/runtime/asm_amd64.s:1650

It's not one specific test either; it only happens when I run enough tests, e.g. run the whole thing. E.g., go test -run "TestScripts/quotes/add" ./internal/bot will pass, but gotip test ./internal/bot will fail. TestScripts/quotes sometimes fails.

I have tested with -race and don't see any races reported. Passing -parallel 1 also does not change things. I have also changed my postgres driver to another one, and see the same behavior.

Again, apologies for such a big repro; I'm not quite sure where to start minimizing this.

cc @bradfitz

ref #60304 #53970

@dmitshur
Copy link
Contributor

Also CC @kardianos, @kevinburke via owners.

@dmitshur dmitshur added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jun 22, 2023
@dmitshur dmitshur added this to the Backlog milestone Jun 22, 2023
@dmitshur
Copy link
Contributor

dmitshur commented Jun 22, 2023

Tentatively marking as a release-blocker since this is caused by a CL new in Go 1.21, and needs investigation. The release-blocker label can be removed if investigation shows this doesn't need to block the release. (Or, of course, the issue can be closed when the problem is resolved.)

@zikaeroh
Copy link
Contributor Author

zikaeroh commented Jun 22, 2023

Some further investigation:

If I wholesale replace my context with context.Background(), the errors don't happen. This to me implies that the libraries I'm calling are not introducing their own cancel/deadline/timeout. However, this is confusing because when I use my typical context, ctx.Err() before and after a failing query are both nil, so my context can't have been cancelled.

I'm hoping to recreate this from scratch in a new repo, but I unfortunately found this at a time when I'm less able to spend time working on it. ☹️

@zikaeroh
Copy link
Contributor Author

zikaeroh commented Jun 22, 2023

Hm, so I have a theory (which I need to test):

  • sqlboiler's queries.Bind function makes a query: https://github.com/volatiletech/sqlboiler/blob/4dd76363f68e73f9bd52ca984446db69a2282981/queries/reflect.go#L136
  • initContextClose does WithCancel and creates a new context (which is given to awaitDone) and sets rs.cancel. awaitDone is run in a new goroutine via go rs.awaitDone(), so it's sitting there waiting for the context(s) to stop.
  • The rows are read out successfully by Bind.
  • Bind closes the rows. This calls rs.cancel.
  • The awaitDone goroutine may wake up because one of the contexts it's waiting on was cancelled.
  • Bind then checks rows.Err, and depending on whether or not awaitDone set an error, this function returns a spurious cancellation error.

@zikaeroh
Copy link
Contributor Author

Yeah, I think that the above is correct; I modified database/sql to check what was happening and it's the error from rs.contextDone. I also noted that the docs say:

// Close closes the Rows, preventing further enumeration. If Next is called
// and returns false and there are no further result sets,
// the Rows are closed automatically and it will suffice to check the
// result of Err. Close is idempotent and does not affect the result of Err.

// Err returns the error, if any, that was encountered during iteration.
// Err may be called after an explicit or implicit Close.

Which implies that close should not be affecting Err at all, though that is unfortunately what the CL ended up doing.

I'm not totally sure if rs.cancel's eventual error is intended to be propagated outward; if the intent is just to ensure that the awaitDone goroutine exits, maybe the "right" thing to do is to pass in a third context to awaitDone solely for that purpose which doesn't set rs.contextDone.

Testing out that fix locally does the trick, fixing my repro and still passing the existing database/sql tests.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/505397 mentions this issue: database/sql: prevent internal context error from being returned from Rows.Err()

@zikaeroh
Copy link
Contributor Author

I have mailed a CL which I believe fixes the issue. Happy to take feedback or have my CL superceded by a better one!

@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Jun 24, 2023
@cagedmantis
Copy link
Contributor

This has a CL and needs a review? This is a release blocker.

/cc @bradfitz @rsc

bradfitz pushed a commit to tailscale/go that referenced this issue Jul 15, 2023
… Rows.Err()

CL 497675 modified Rows such that context errors are propagated through
Rows.Err(). This caused an issue where calling Close meant that an
internal cancellation error would (eventually) be returned from Err:

1. A caller makes a query using a cancellable context.
2. initContextClose sees that either the query context or the
   transaction context can be canceled, so will need to spawn a
   goroutine to capture their errors.
3. initContextClose derives a context from the query context via
   WithCancel and sets rs.cancel.
4. When a user calls Close, rs.cancel is called. awaitDone's ctx is
   cancelled, which is good, since we don't want it to hang forever.
5. This internal cancellation (after CL 497675) has its error saved on
   contextDone.
6. Later, calling Err will return the error in contextDone if present.

This leads to a race condition depending on how quickly Err is called
after Close.

The docs for Close and Err state that calling Close should have no
affect on the return result for Err. So, a potential fix is to ensure
that awaitDone does not save the error when the cancellation comes from
a Close via rs.cancel.

This CL does that, using a new context not derived from the query
context, whose error is ignored as the query context's error used to be
before the original bugfix.

The included test fails before the CL, and passes afterward.

Fixes golang#60932

Change-Id: I2bf4c549efd83d62b86e298c9c45ebd06a3ad89a
Reviewed-on: https://go-review.googlesource.com/c/go/+/505397
Auto-Submit: Russ Cox <[email protected]>
Run-TryBot: Russ Cox <[email protected]>
TryBot-Result: Gopher Robot <[email protected]>
Reviewed-by: Russ Cox <[email protected]>
Reviewed-by: Dmitri Shuralyov <[email protected]>
Reviewed-by: Brad Fitzpatrick <[email protected]>
@golang golang locked and limited conversation to collaborators Jul 4, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. release-blocker
Projects
None yet
Development

No branches or pull requests

4 participants