Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed deadlock in leader controller gauge callback #354

Merged
merged 1 commit into from
Jun 15, 2023

Conversation

merlimat
Copy link
Collaborator

@merlimat merlimat commented Jun 14, 2023

At collection time, we have the Otel lock and try to grab the leader controller read-lock

1 @ 0x4482d6 0x4595de 0x4595b5 0x475a45 0x1a6d77b 0x1a6d757 0xa50f9e 0xa2f726 0xa322e7 0xa2b60e 0xa4b702 0x89c9d3 0x479ae1
# labels: {"oxia":"metrics"}
#	0x475a44	sync.runtime_SemacquireMutex+0x24						/usr/local/go/src/runtime/sema.go:77
#	0x1a6d77a	sync.(*RWMutex).RLock+0x5a							/usr/local/go/src/sync/rwmutex.go:71
#	0x1a6d756	oxia/server.NewLeaderController.func1+0x36					/src/oxia/server/leader_controller.go:121
#	0xa50f9d	oxia/common/metrics.NewGauge.func1+0x5d						/src/oxia/common/metrics/gauge.go:58
#	0xa2f725	go.opentelemetry.io/otel/sdk/metric.(*meter).RegisterCallback.func1+0x85	/go/pkg/mod/go.opentelemetry.io/otel/sdk/[email protected]/meter.go:263
#	0xa322e6	go.opentelemetry.io/otel/sdk/metric.(*pipeline).produce+0x326			/go/pkg/mod/go.opentelemetry.io/otel/sdk/[email protected]/pipeline.go:144
#	0xa2b60d	go.opentelemetry.io/otel/sdk/metric.(*manualReader).Collect+0xed		/go/pkg/mod/go.opentelemetry.io/otel/sdk/[email protected]/manual_reader.go:139
#	0xa4b701	go.opentelemetry.io/otel/exporters/prometheus.(*collector).Collect+0x81		/go/pkg/mod/go.opentelemetry.io/otel/exporters/[email protected]/exporter.go:119
#	0x89c9d2	github.com/prometheus/client_golang/prometheus.(*Registry).Gather.func1+0xf2	/go/pkg/mod/github.com/prometheus/[email protected]/prometheus/registry.go:457

During NewTerm handling, we have the leader controller lock and we try to unregister the gauge, acquiring the Otel lock

7 @ 0x4482d6 0x4595de 0x4595b5 0x475a45 0x490785 0xa31e8d 0xa31e61 0xa48b8f 0xa50b02 0x1a6e155 0x1a6b063 0x996d98 0x985807 0x996c58 0x96e9b0 0x973a6f 0x96c298 0x479ae1
# labels: {"bind":"[::]:6649", "oxia":"internal"}
#	0x475a44	sync.runtime_SemacquireMutex+0x24								/usr/local/go/src/runtime/sema.go:77
#	0x490784	sync.(*Mutex).lockSlow+0x164									/usr/local/go/src/sync/mutex.go:171
#	0xa31e8c	sync.(*Mutex).Lock+0x4c										/usr/local/go/src/sync/mutex.go:90
#	0xa31e60	go.opentelemetry.io/otel/sdk/metric.(*pipeline).addMultiCallback.func1+0x20			/go/pkg/mod/go.opentelemetry.io/otel/sdk/[email protected]/pipeline.go:116
#	0xa48b8e	go.opentelemetry.io/otel/sdk/metric.unregisterFuncs.Unregister+0x6e				/go/pkg/mod/go.opentelemetry.io/otel/sdk/[email protected]/pipeline.go:521
#	0xa50b01	oxia/common/metrics.(*gauge).Unregister+0x21							/src/oxia/common/metrics/gauge.go:36
#	0x1a6e154	oxia/server.(*leaderController).NewTerm+0x234							/src/oxia/server/leader_controller.go:231
#	0x1a6b062	oxia/server.(*internalRpcServer).NewTerm+0x4c2							/src/oxia/server/internal_rpc_server.go:122
#	0x996d97	oxia/proto._OxiaCoordination_NewTerm_Handler.func1+0x77						/src/oxia/proto/replication_grpc.pb.go:207
#	0x985806	github.com/grpc-ecosystem/go-grpc-prometheus.(*ServerMetrics).UnaryServerInterceptor.func1+0x86	/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/server_metrics.go:107
#	0x996c57	oxia/proto._OxiaCoordination_NewTerm_Handler+0x137						/src/oxia/proto/replication_grpc.pb.go:209
#	0x96e9af	google.golang.org/grpc.(*Server).processUnaryRPC+0xdef						/go/pkg/mod/google.golang.org/[email protected]/server.go:1345
#	0x973a6e	google.golang.org/grpc.(*Server).handleStream+0xa2e						/go/pkg/mod/google.golang.org/[email protected]/server.go:1722
#	0x96c297	google.golang.org/grpc.(*Server).serveStreams.func1.2+0x97					/go/pkg/mod/google.golang.org/[email protected]/server.go:966

@mattisonchao mattisonchao merged commit 1614d08 into streamnative:main Jun 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants