Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

util/metric: v23.2.0-alpha: child already exists #110081

Closed
cockroach-sentry opened this issue Sep 6, 2023 · 3 comments
Closed

util/metric: v23.2.0-alpha: child already exists #110081

cockroach-sentry opened this issue Sep 6, 2023 · 3 comments
Assignees
Labels
A-kv-server Relating to the KV-level RPC server C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-sentry Originated from an in-the-wild panic report.

Comments

@cockroach-sentry
Copy link
Collaborator

cockroach-sentry commented Sep 6, 2023

This issue was auto filed by Sentry. It represents a crash or reported error on a live cluster with telemetry enabled.

Sentry Link: https://cockroach-labs.sentry.io/issues/4456806246/?referrer=webhooks_plugin

Panic Message:

agg_metric.go:107: child [× × ×] already exists
(1) attached stack trace
  -- stack trace:
  | runtime.gopanic
  | 	GOROOT/src/runtime/panic.go:884
  | [...repeated from below...]
Wraps: (2) assertion failure
Wraps: (3) attached stack trace
  -- stack trace:
  | github.com/cockroachdb/cockroach/pkg/util/metric/aggmetric.(*childSet).add
  | 	github.com/cockroachdb/cockroach/pkg/util/metric/aggmetric/agg_metric.go:107
  | github.com/cockroachdb/cockroach/pkg/util/metric/aggmetric.(*AggGauge).AddChild
  | 	github.com/cockroachdb/cockroach/pkg/util/metric/aggmetric/gauge.go:87
  | github.com/cockroachdb/cockroach/pkg/rpc.(*Metrics).acquire
  | 	github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/metrics.go:189
  | github.com/cockroachdb/cockroach/pkg/rpc.(*Context).newPeer
  | 	github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/peer.go:178
  | github.com/cockroachdb/cockroach/pkg/rpc.(*Context).grpcDialNodeInternal
  | 	github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:2157
  | github.com/cockroachdb/cockroach/pkg/rpc.(*Context).GRPCUnvalidatedDial
  | 	github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:2096
  | github.com/cockroachdb/cockroach/pkg/gossip.(*client).startLocked.func1.2
  | 	github.com/cockroachdb/cockroach/pkg/gossip/pkg/gossip/client.go:103
  | github.com/cockroachdb/cockroach/pkg/gossip.(*client).startLocked.func1
  | 	github.com/cockroachdb/cockroach/pkg/gossip/pkg/gossip/client.go:115
  | github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2
  | 	github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:484
  | runtime.goexit
  | 	GOROOT/src/runtime/asm_amd64.s:1594
Wraps: (4) child [× × ×] already exists
Error types: (1) *withstack.withStack (2) *assert.withAssertionFailure (3) *withstack.withStack (4) *errutil.leafError
-- report composition:
*errutil.leafError: child [× × ×] already exists
agg_metric.go:107: *withstack.withStack (top exception)
*assert.withAssertionFailure
panic.go:884: *withstack.withStack (1)
(check the extra data payloads)
Stacktrace (expand for inline code snippets):

GOROOT/src/runtime/asm_amd64.s#L1593-L1595

sp.UpdateGoroutineIDToCurrent()
f(ctx)
}()

https://github.com/cockroachdb/cockroach/blob/1382b26a97bf6f70a07d363dc319283c173359eb/pkg/gossip/pkg/gossip/client.go#L114-L116
https://github.com/cockroachdb/cockroach/blob/1382b26a97bf6f70a07d363dc319283c173359eb/pkg/gossip/pkg/gossip/client.go#L102-L104
https://github.com/cockroachdb/cockroach/blob/1382b26a97bf6f70a07d363dc319283c173359eb/pkg/rpc/pkg/rpc/context.go#L2095-L2097
https://github.com/cockroachdb/cockroach/blob/1382b26a97bf6f70a07d363dc319283c173359eb/pkg/rpc/pkg/rpc/context.go#L2156-L2158
https://github.com/cockroachdb/cockroach/blob/1382b26a97bf6f70a07d363dc319283c173359eb/pkg/rpc/pkg/rpc/peer.go#L177-L179
https://github.com/cockroachdb/cockroach/blob/1382b26a97bf6f70a07d363dc319283c173359eb/pkg/rpc/pkg/rpc/metrics.go#L188-L190
}
g.add(child)
return child

if cs.mu.tree.Has(metric) {
panic(errors.AssertionFailedf("child %v already exists", metric.labelValues()))
}

GOROOT/src/runtime/panic.go#L883-L885
GOROOT/src/runtime/asm_amd64.s#L1593-L1595
sp.UpdateGoroutineIDToCurrent()
f(ctx)
}()

https://github.com/cockroachdb/cockroach/blob/1382b26a97bf6f70a07d363dc319283c173359eb/pkg/gossip/pkg/gossip/client.go#L114-L116
https://github.com/cockroachdb/cockroach/blob/1382b26a97bf6f70a07d363dc319283c173359eb/pkg/gossip/pkg/gossip/client.go#L102-L104
https://github.com/cockroachdb/cockroach/blob/1382b26a97bf6f70a07d363dc319283c173359eb/pkg/rpc/pkg/rpc/context.go#L2095-L2097
https://github.com/cockroachdb/cockroach/blob/1382b26a97bf6f70a07d363dc319283c173359eb/pkg/rpc/pkg/rpc/context.go#L2156-L2158
https://github.com/cockroachdb/cockroach/blob/1382b26a97bf6f70a07d363dc319283c173359eb/pkg/rpc/pkg/rpc/peer.go#L177-L179
https://github.com/cockroachdb/cockroach/blob/1382b26a97bf6f70a07d363dc319283c173359eb/pkg/rpc/pkg/rpc/metrics.go#L188-L190
}
g.add(child)
return child

if cs.mu.tree.Has(metric) {
panic(errors.AssertionFailedf("child %v already exists", metric.labelValues()))
}

GOROOT/src/runtime/asm_amd64.s in runtime.goexit at line 1594
pkg/util/stop/stopper.go in pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2 at line 484
pkg/gossip/pkg/gossip/client.go in pkg/gossip.(*client).startLocked.func1 at line 115
pkg/gossip/pkg/gossip/client.go in pkg/gossip.(*client).startLocked.func1.2 at line 103
pkg/rpc/pkg/rpc/context.go in pkg/rpc.(*Context).GRPCUnvalidatedDial at line 2096
pkg/rpc/pkg/rpc/context.go in pkg/rpc.(*Context).grpcDialNodeInternal at line 2157
pkg/rpc/pkg/rpc/peer.go in pkg/rpc.(*Context).newPeer at line 178
pkg/rpc/pkg/rpc/metrics.go in pkg/rpc.(*Metrics).acquire at line 189
pkg/util/metric/aggmetric/gauge.go in pkg/util/metric/aggmetric.(*AggGauge).AddChild at line 87
pkg/util/metric/aggmetric/agg_metric.go in pkg/util/metric/aggmetric.(*childSet).add at line 107
GOROOT/src/runtime/panic.go in runtime.gopanic at line 884
GOROOT/src/runtime/asm_amd64.s in runtime.goexit at line 1594
pkg/util/stop/stopper.go in pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2 at line 484
pkg/gossip/pkg/gossip/client.go in pkg/gossip.(*client).startLocked.func1 at line 115
pkg/gossip/pkg/gossip/client.go in pkg/gossip.(*client).startLocked.func1.2 at line 103
pkg/rpc/pkg/rpc/context.go in pkg/rpc.(*Context).GRPCUnvalidatedDial at line 2096
pkg/rpc/pkg/rpc/context.go in pkg/rpc.(*Context).grpcDialNodeInternal at line 2157
pkg/rpc/pkg/rpc/peer.go in pkg/rpc.(*Context).newPeer at line 178
pkg/rpc/pkg/rpc/metrics.go in pkg/rpc.(*Metrics).acquire at line 189
pkg/util/metric/aggmetric/gauge.go in pkg/util/metric/aggmetric.(*AggGauge).AddChild at line 87
pkg/util/metric/aggmetric/agg_metric.go in pkg/util/metric/aggmetric.(*childSet).add at line 107

Tags

Tag Value
Command server
Environment v23.2.0-alpha.00000000
Go Version go1.19.10
Platform linux amd64
Distribution CCL
Cockroach Release v23.1.0-alpha.7-5180-g1382b26a97
Cockroach SHA 1382b26
# of CPUs 16
# of Goroutines 1001

Jira issue: CRDB-31257

@cockroach-sentry cockroach-sentry added O-sentry Originated from an in-the-wild panic report. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. labels Sep 6, 2023
@yuzefovich yuzefovich changed the title Sentry: agg_metric.go:107: child [× × ×] already exists (1) attached stack trace -- stack trace: | runtime.gopanic | GOROOT/src/runtime/panic.go:884 | [...repeated from below...] Wraps: (2... v23.2.0-alpha: util/metric: child already exists Sep 24, 2023
@yuzefovich yuzefovich changed the title v23.2.0-alpha: util/metric: child already exists util/metric: v23.2.0-alpha: child already exists Sep 24, 2023
@knz
Copy link
Contributor

knz commented Sep 25, 2023

@aliher1911 @pavelkalinnikov this seems related to #108841. Can you have a look?

@blathers-crl
Copy link

blathers-crl bot commented Sep 25, 2023

cc @cockroachdb/replication

@knz knz added A-kv-server Relating to the KV-level RPC server and removed A-observability-inf labels Sep 25, 2023
@pav-kv pav-kv self-assigned this Sep 25, 2023
@pav-kv
Copy link
Collaborator

pav-kv commented Sep 25, 2023

The SHA 1382b26 committed on Aug 8, which precedes another instance of this failure in #108499 on Aug 10. It was fixed by #108841.

@pav-kv pav-kv closed this as completed Sep 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-server Relating to the KV-level RPC server C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-sentry Originated from an in-the-wild panic report.
Projects
None yet
Development

No branches or pull requests

4 participants