Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spanner: panic on native metrics #11080

Closed
apstndb opened this issue Nov 4, 2024 · 11 comments · Fixed by #11095
Closed

spanner: panic on native metrics #11080

apstndb opened this issue Nov 4, 2024 · 11 comments · Fixed by #11095
Assignees
Labels
api: spanner Issues related to the Spanner API. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release.

Comments

@apstndb
Copy link
Contributor

apstndb commented Nov 4, 2024

I have found the client built in metrics(#10419, #10998) can panic in some condition.

Client

Spanner

Environment

go version go1.23.2 darwin/arm64

Code and Dependencies

I can't find minimum reproducing code, but I can reproduce it in cloudspannerecosystem/spanner-cli.

Reproducing branch

Only do go get cloud.google.com/go/[email protected]

apstndb/spanner-cli#1

$ go test -v -run TestRequestPriority
=== RUN   TestRequestPriority
=== RUN   TestRequestPriority/use_default_MEDIUM_priority
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x2 addr=0x28 pc=0x10316ef04]

goroutine 94 [running]:
cloud.google.com/go/spanner.(*builtinMetricsTracer).toOtelMetricAttrs(0x14000384090, {0x1032c539b, 0xf})
        /Users/apstndb/go/pkg/mod/cloud.google.com/go/[email protected]/metrics.go:424 +0x3f4
cloud.google.com/go/spanner.recordOperationCompletion(0x14000384090)
        /Users/apstndb/go/pkg/mod/cloud.google.com/go/[email protected]/metrics.go:490 +0x58
cloud.google.com/go/spanner.(*grpcSpannerClient).BatchCreateSessions(0x1400058a870, {0x1038a1980, 0x14000414d80}, 0x14000470000, {0x1400038a0c0, 0x1, 0x1})
        /Users/apstndb/go/pkg/mod/cloud.google.com/go/[email protected]/grpc_client.go:134 +0x1c4
cloud.google.com/go/spanner.(*sessionClient).executeBatchCreateSessions(0x14000528160, {0x1038ae1b8, 0x1400058a870}, 0x1, 0x14000584690, 0x140005846c0, {0x10389aed8, 0x14000270200})
        /Users/apstndb/go/pkg/mod/cloud.google.com/go/[email protected]/sessionclient.go:277 +0x430
created by cloud.google.com/go/spanner.(*sessionClient).batchCreateSessions in goroutine 50
        /Users/apstndb/go/pkg/mod/cloud.google.com/go/[email protected]/sessionclient.go:243 +0x2f8
exit status 2
FAIL    github.com/cloudspannerecosystem/spanner-cli    0.492s

Workaround

Use DisableNativeMetrics: true.
apstndb/spanner-cli#2

Expected behavior

Client built in metrics don't break user code.

@apstndb apstndb added the triage me I really want to be triaged. label Nov 4, 2024
@product-auto-label product-auto-label bot added the api: spanner Issues related to the Spanner API. label Nov 4, 2024
@harshachinta
Copy link
Contributor

@rahul2393 Can you please help triage this?

@apstndb
Copy link
Contributor Author

apstndb commented Nov 4, 2024

I have succeeded to reproduce the panic.
It seems that the current implementation is not compatible with option.WithGRPCConn(conn).

package main

import (
	"context"
	"testing"

	"cloud.google.com/go/spanner"
	"cloud.google.com/go/spanner/spannertest"
	"github.com/samber/lo"
	"google.golang.org/api/option"
	"google.golang.org/grpc"
	"google.golang.org/grpc/credentials/insecure"
)

func TestReproduce(t *testing.T) {
	// success if true
	clientConf := spanner.ClientConfig{DisableNativeMetrics: false}

	ctx := context.Background()

	server := lo.Must(spannertest.NewServer("localhost:0"))
	conn := lo.Must(grpc.NewClient(server.Addr, grpc.WithTransportCredentials(insecure.NewCredentials())))

	cli := lo.Must(spanner.NewClientWithConfig(ctx,
		"projects/fake-project/instances/fake-instance/databases/fake-database", clientConf, option.WithGRPCConn(conn)))
	tx := lo.Must(spanner.NewReadWriteStmtBasedTransaction(ctx, cli))
	defer tx.Rollback(ctx)
}
$ go test                                                                                                                                                    
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x2 addr=0x28 pc=0x105568e14]

goroutine 86 [running]:
cloud.google.com/go/spanner.(*builtinMetricsTracer).toOtelMetricAttrs(0x14000316d80, {0x105641e61, 0xf})
        /Users/apstndb/go/pkg/mod/cloud.google.com/go/[email protected]/metrics.go:424 +0x3f4
cloud.google.com/go/spanner.recordOperationCompletion(0x14000316d80)
        /Users/apstndb/go/pkg/mod/cloud.google.com/go/[email protected]/metrics.go:490 +0x58
cloud.google.com/go/spanner.(*grpcSpannerClient).BatchCreateSessions(0x1400031ad90, {0x105be2d60, 0x14000315740}, 0x140004175e0, {0x1400031ae20, 0x1, 0x1})
        /Users/apstndb/go/pkg/mod/cloud.google.com/go/[email protected]/grpc_client.go:134 +0x1c4
cloud.google.com/go/spanner.(*sessionClient).executeBatchCreateSessions(0x140004f6210, {0x105beef38, 0x1400031ad90}, 0x19, 0x140003148a0, 0x140003148d0, {0x105bdc560, 0x140001ad400})
        /Users/apstndb/go/pkg/mod/cloud.google.com/go/[email protected]/sessionclient.go:277 +0x430
created by cloud.google.com/go/spanner.(*sessionClient).batchCreateSessions in goroutine 13
        /Users/apstndb/go/pkg/mod/cloud.google.com/go/[email protected]/sessionclient.go:243 +0x2f8
exit status 2

@apstndb
Copy link
Contributor Author

apstndb commented Nov 4, 2024

I have also noticed the native metrics is enabled as default even if there is no Cloud Monitoring permission, especially integration test.
It only consider SPANNER_EMULATOR_HOST, so we should manually disable when we use another way like testcontainers. (apstndb/spanner-mycli#26 (comment))

I don't prefer to enable native metrics when testing.Testing() == true

@rahul2393 rahul2393 added priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. and removed triage me I really want to be triaged. labels Nov 5, 2024
@rahul2393
Copy link
Contributor

Thanks @apstndb for reporting the issue.

  1. Yes native metrics enabled default is expected and there is on going work to include Cloud monitoring permission in Spanner's default role.
  2. We can't disable based on testing.Testing() == true since there can be integration tests which actually test end to end native metrics.

The PR is created to fix the panic case, it will log a warning message of insufficient permissions when exporting metrics without any other impact.

Let me know if you think the patch is not sufficent.

@apstndb
Copy link
Contributor Author

apstndb commented Nov 7, 2024

Thank you for fixing the panic!
Since there isn't much information to identify the source, we are concerned about metrics being polluted by exceptional (non-service) clients.
We feel like we need to opt out of this depending on our needs.

@rahul2393
Copy link
Contributor

Thanks for the feedback, we do export instance, client_location, and other identifiers as attributes when exporting the metrics which can help you distinguish between the sources.

@apstndb
Copy link
Contributor Author

apstndb commented Nov 9, 2024

I've found another pattern to fail user code, it requires default credentials for metric client even if it uses Cloud Spanner Emulator in testcontainers. It still be reproducible in v1.72.0.
I believe option.WithoutAuthentication() should be propagated as noop.NewMeterProvider().

@rahul2393
Copy link
Contributor

@apstndb Can you elaborate more, what is the issue you are seeing, failure logs or replication code can help me fix it
Thanks

@apstndb
Copy link
Contributor Author

apstndb commented Nov 9, 2024

I am now AFK, so I can't make minimum reproducible code but I have encountered in this PR.

apstndb/execspansql#31

integration_test.go:143: credentials: could not find default credentials. See https://cloud.google.com/docs/authentication/external/set-up-adc for more information

on https://github.com/apstndb/execspansql/blob/eac504f9d90221f8e8d8ab7763b510d47abf13ed/integration_test.go#L141

@rahul2393
Copy link
Contributor

@apstndb Can you help approve the workflow here apstndb/execspansql#32, want to check if fix fixes the test.

@apstndb
Copy link
Contributor Author

apstndb commented Nov 9, 2024

It seems that Go test have succeeded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: spanner Issues related to the Spanner API. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release.
Projects
None yet
3 participants