Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent errors logged after enabling telemetry #2018

Open
tomassommareqt opened this issue Nov 6, 2023 · 6 comments
Open

Intermittent errors logged after enabling telemetry #2018

tomassommareqt opened this issue Nov 6, 2023 · 6 comments
Assignees
Labels
priority: p3 Desirable enhancement or fix. May not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@tomassommareqt
Copy link

Bug Description

We are running gcr.io/cloud-sql-connectors/cloud-sql-proxy:2.7.0 as a container next to our main http api container for connectivity to our CloudSQL instance.

After enabling telemetry using the --telemetry-project and -telemetry-prefix flags we have recurrently gotten the following error logged:

2023/11/04 13:58:43 Failed to export to Stackdriver: rpc error: code = Internal desc = One or more TimeSeries could not be written: Internal error encountered. Please retry after a few seconds. If internal errors persist, contact support at https://cloud.google.com/support/docs.: global{} timeSeries[0]: custom.googleapis.com/opencensus/<redacted>_cloud_sql_proxy/cloudsqlconn/refresh_success_count{opencensus_task:go-1@<redacted>,cloudsql_instance:<redacted>}; Internal error encountered. Please retry after a few seconds. If internal errors persist, contact support at https://cloud.google.com/support/docs.: global{} timeSeries[1]: custom.googleapis.com/opencensus/<redacted>_cloud_sql_proxy/cloudsqlconn/dial_latency{cloudsql_instance:<redacted>,opencensus_task:go-1@<redacted>}

However when expecting the metrics we can see that it works as expected. So this is mostly causes the issue of polluted logs. But it would also be interesting to understand why this error is reported.

Example code (or command)

// paste your code or command here

Stacktrace

`2023/11/04 13:58:43 Failed to export to Stackdriver: rpc error: code = Internal desc = One or more TimeSeries could not be written: Internal error encountered. Please retry after a few seconds. If internal errors persist, contact support at https://cloud.google.com/support/docs.: global{} timeSeries[0]: custom.googleapis.com/opencensus/<redacted>_cloud_sql_proxy/cloudsqlconn/refresh_success_count{opencensus_task:go-1@<redacted>,cloudsql_instance:<redacted>}; Internal error encountered. Please retry after a few seconds. If internal errors persist, contact support at https://cloud.google.com/support/docs.: global{} timeSeries[1]: custom.googleapis.com/opencensus/<redacted>_cloud_sql_proxy/cloudsqlconn/dial_latency{cloudsql_instance:<redacted>,opencensus_task:go-1@<redacted>}`

Steps to reproduce?

  1. Launch cloud-sql-proxy 2.7.0 as a container in GCP GKE.
  2. Inspect logs.

Environment

  1. OS type and version: GCP GKE 1.24.14-gke.2700
  2. Cloud SQL Proxy version: 2.7.0
  3. Proxy invocation command: apiVersion: apps/v1 kind: Deployment metadata: name: <redacted> spec: template: spec: containers: - name: cloudsql-proxy image: gcr.io/cloud-sql-connectors/cloud-sql-proxy:2.7.0 args: - "--auto-iam-authn" - "--max-sigterm-delay" - "25s" - "--structured-logs" - "--telemetry-project" - "<redacted>" - "--telemetry-prefix" - "<redacted>_cloud_sql_proxy" - "<connection-string-redacted>"

Additional Details

No response

@tomassommareqt tomassommareqt added the type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. label Nov 6, 2023
@enocom enocom added the priority: p3 Desirable enhancement or fix. May not be included in next release. label Nov 6, 2023
@enocom
Copy link
Member

enocom commented Nov 6, 2023

Thanks @tomassommareqt. FWIW I have seen the same logs when working on this feature. I don't expect these logs to show up outside of a dev context, though. We'll investigate and fix this.

@rojomisin
Copy link

rojomisin commented Mar 15, 2024

still seen in 2.9.0 although the metrics work using metrics writer role

2024/03/15 23:10:50 Failed to export to Stackdriver: rpc error: code = PermissionDenied desc = The caller does not have permission

@enocom
Copy link
Member

enocom commented Mar 22, 2024

Thanks, @rojomisin. We still haven't got to this. I wonder if this is race condition in OpenCensus itself.

@rojomisin
Copy link

perhaps fixed in OpenTelemetry pkg? https://github.com/open-telemetry/opentelemetry-go-contrib

@enocom
Copy link
Member

enocom commented Apr 1, 2024

Quite possibly. We're currently using OpenCensus given that some internal tooling that uses the Proxy has a big investment in OpenCensus. But we might revisit that decision now that OpenTelemetry's metrics package is stable.

@enocom enocom assigned jackwotherspoon and unassigned enocom May 1, 2024
@jackwotherspoon
Copy link
Collaborator

We will be migrating OpenTelemetry in the somewhat near future which will hopefully resolve this issue...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p3 Desirable enhancement or fix. May not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

4 participants