-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Cloud Run Job task_id
to avoid high cardinality?
#874
Comments
Do you know if only a single instance can exist for a given task index? We just need to make sure there aren't collisions |
What kind of collision are you thinking of? If 2 tasks were to share an instance ID and write the exact same metric roughly at the same time? According to the Cloud Run Job documentation each task gets their own instance.
|
xref: #465 |
Do you mean the TASK_INDEX? As I understand, that is a number for each task within a job (ie, 0, 1, ...). This means runs of the same job as different invocations would end up with the same labels for job+task_id. I don't think that would be a problem with cloud monitoring. But, using the Instance ID matches more closely to the OTel faas conventions than Task ID (https://opentelemetry.io/docs/specs/semconv/attributes-registry/faas/#function-as-a-service-attributes) We did add the From what I understand, the Cloud Monitoring backend scales well enough that instance ID shouldn't cause cardinality issues there. But if you are using the GCP resource detector and exporting that data somewhere else, the |
Closing based on #874 (comment) |
I recently setup Opentelemetry in a Go application running inside a Cloud Run Job and noticed that the monitored resource
task_id
label kept changing on every single write. In this specific case I am going with a fairly frequent run of the job (several times an hour) which immediately made me think that, long term, it might cause issues with a high cardinality on every metric being written by that job.After some debugging, I can see that the
gcp.NewDetector()
is configured to return the instance ID from the metadata server (which comes out to be a long ID like0087244a809d22283efa2....
) which is turn is used by otel as theFaaSInstanceKey
which is eventually exported as thetask_id
in a generic task.Reading the definition of the label, I am left wondering whether it might make more sense to default to something like the Cloud Run Job task index to avoid issues with long term high cardinality.
Or is this not to be concerned (not a monitoring expert here)?
The text was updated successfully, but these errors were encountered: