Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: tpccbench/nodes=6/cpu=16/multi-az failed #73351

Closed
cockroach-teamcity opened this issue Dec 1, 2021 · 5 comments · Fixed by #73383
Closed

roachtest: tpccbench/nodes=6/cpu=16/multi-az failed #73351

cockroach-teamcity opened this issue Dec 1, 2021 · 5 comments · Fixed by #73383
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.

Comments

@cockroach-teamcity
Copy link
Member

roachtest.tpccbench/nodes=6/cpu=16/multi-az failed with artifacts on master @ a80cfbee826f70988381a0a85c0fe7aba0115484:

The test failed on branch=master, cloud=gce:
test timed out (see artifacts for details)
Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

/cc @cockroachdb/kv-triage

This test on roachdash | Improve this report!

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Dec 1, 2021
@tbg
Copy link
Member

tbg commented Dec 2, 2021

Timed out during the tpcc post-import rebalancing warmup:

12:48:35 cluster.go:545: > /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-3824348-1638343289-69-n7cpu16-geo:7 -- ./cockroach workload run tpcc --warehouses=5000 --workers=5000 --max-rate=613 --wait=false --ramp=5m0s --duration=15m0s --scatter --tolerate-errors {pgurl:1-6}

#73301 would definitely have helped here, but I think we might have the stack dumps at least. Going to have a poke.

@tbg
Copy link
Member

tbg commented Dec 2, 2021

Easy enough:

panic: child not present in parent [recovered]
        panic: child not present in parent

goroutine 367076 [running]:
panic({0x426b1c0, 0x83d8ab0})
        /usr/local/go/src/runtime/panic.go:1147 +0x3a8 fp=0xc01a5f1c08 sp=0xc01a5f1b48 pc=0x488448
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).Recover(0xc000987710, {0x84eef58, 0xc004a5f090})
        /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:254 +0xaa fp=0xc01a5f1c68 sp=0xc01a5f1c08 pc=0xff52aa
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2·dwrap·10()
        /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:482 +0x2e fp=0xc01a5f1c90 sp=0xc01a5f1c68 pc=0xff676e
runtime.deferCallSave(0xc01a5f1d98, 0xc01a5f1f90)
        /usr/local/go/src/runtime/panic.go:950 +0x82 fp=0xc01a5f1ca0 sp=0xc01a5f1c90 pc=0x488042
runtime.runOpenDeferFrame(0xc001f7b9f0, 0xc000fa4000)
        /usr/local/go/src/runtime/panic.go:889 +0x27b fp=0xc01a5f1d20 sp=0xc01a5f1ca0 pc=0x4879db
panic({0x426b1c0, 0x83d8ab0})
        /usr/local/go/src/runtime/panic.go:1038 +0x215 fp=0xc01a5f1de0 sp=0xc01a5f1d20 pc=0x4882b5
github.com/cockroachdb/cockroach/pkg/util/tracing.(*crdbSpan).childFinished(0xc00d727240, 0xed9396646)
        /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/util/tracing/crdbspan.go:630 +0x38e fp=0xc01a5f1e80 sp=0xc01a5f1de0 pc=0xb941ae
github.com/cockroachdb/cockroach/pkg/util/tracing.(*crdbSpan).finish(0xc004a5ef40)
        /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/util/tracing/crdbspan.go:188 +0x145 fp=0xc01a5f1ee8 sp=0xc01a5f1e80 pc=0xb909c5
github.com/cockroachdb/cockroach/pkg/util/tracing.(*spanInner).Finish(0xc004a5ef00)
        /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/util/tracing/span_inner.go:90 +0x4d fp=0xc01a5f1f18 sp=0xc01a5f1ee8 pc=0xb9b88d
github.com/cockroachdb/cockroach/pkg/util/tracing.(*Span).Finish(0x0)
        /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/util/tracing/span.go:83 +0x4a fp=0xc01a5f1f30 sp=0xc01a5f1f18 pc=0xb9a9ca
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2·dwrap·12()
        /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:485 +0x26 fp=0xc01a5f1f48 sp=0xc01a5f1f30 pc=0xff66a6
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2()
        /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:492 +0x168 fp=0xc01a5f1fe0 sp=0xc01a5f1f48 pc=0xff65a8
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1581 +0x1 fp=0xc01a5f1fe8 sp=0xc01a5f1fe0 pc=0x4bc2c1
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx
        /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:481 +0x445

@andreimatei you've touched tracing and invariants around spans recently, so I'm pretty sure you would either be the right person to fix this problem or know who is.

Something that sticks out at me is

github.com/cockroachdb/cockroach/pkg/util/tracing.(*Span).Finish(0x0)

so we're finishing a nil Span but I can't make sense of the rest of the stack trace, which seems to be operating on a non-nil spanInner. Also, Finish on a nil span is an obvious noop in code.

For your convenience, here's the entire goroutine dump https://gist.github.com/6c14860b40bfddab95a889c6d6dbc455

@tbg
Copy link
Member

tbg commented Dec 2, 2021

Also x-ref #73374 which I saw happen on this issue

@andreimatei
Copy link
Contributor

Looking

@andreimatei
Copy link
Contributor

Fixing in #73383

andreimatei added a commit to andreimatei/cockroach that referenced this issue Dec 2, 2021
When a child span finishes, it tries to deregister itself from the
parent. We assert that the parent has a reference to this child. This
assertion fired because there is a case where the parent does not have a
reference to the child - when the parent had too many open children at
the time when the child was created, it will not register the child. In
effect, such a child is a root. This patch makes the child in question
aware of the fact that it is really a root.

Fixes cockroachdb#73351

Release note: None
craig bot pushed a commit that referenced this issue Dec 2, 2021
73383: tracing: fix a crash r=andreimatei a=andreimatei

When a child span finishes, it tries to deregister itself from the
parent. We assert that the parent has a reference to this child. This
assertion fired because there is a case where the parent does not have a
reference to the child - when the parent had too many open children at
the time when the child was created, it will not register the child. In
effect, such a child is a root. This patch makes the child in question
aware of the fact that it is really a root.

Fixes #73351

Release note: None

73387: sql/tests: ignore an expected error for RSG r=yuzefovich a=yuzefovich

Addresses: #70663 (comment).

Release note: None

Co-authored-by: Andrei Matei <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
@craig craig bot closed this as completed in 7ad5384 Dec 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants