-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Opentelemery reporting to GCM does not work in Cloud Functions - errors due to no NODE_ID attribute in TimeSeries #679
Comments
I would also note that the last public release is 0.17 from Jun last year, and there have been many changes since then... |
The bug report above is very thorough so I think the action items are clear:
Unfortunately this is expected. Like you mentioned it's not well documented here, but k8s workloads should use the Downward API to pass in pod and namespaces and manually set the container name. This is documented for Go here |
Huh weird, this should have been working based on the code linked at the time this issue was created. We fixed most of this in #600 and #643 but still need to do a release like you mentioned. I'll verify it's working and make a release. |
at main/HEAD, it is still not correct. The mapping for cloud functions set up at opentelemetry-resource-util/src/index.ts#L130) only copies REGION and FUNCTION_NAME, into the TimeSeries, not FAAS_INSTANCE [CLOUD_FUNCTION]: {
[REGION]: {otelKeys: [SemanticResourceAttributes.CLOUD_REGION]},
[FUNCTION_NAME]: {otelKeys: [SemanticResourceAttributes.FAAS_NAME]},
}, I have to manually set the SERVICE_INSTANCE_ID resource attribute for it to be handled correctly by cloud monitoring. if (gcpResources.attributes[Semconv.SEMRESATTRS_FAAS_ID]?.toString()) {
RESOURCE_ATTRIBUTES[Semconv.SEMRESATTRS_SERVICE_INSTANCE_ID] =
gcpResources.attributes[Semconv.SEMRESATTRS_FAAS_ID].toString();
}... |
Thanks for the reply sorry again for not looking at this sooner
The cloud functions and cloud run resources are not writeable for custom metrics so that mapping shouldn't be relevant. Instead it should be writing to opentelemetry-operations-js/packages/opentelemetry-resource-util/src/index.ts Lines 160 to 166 in be4ae61
|
Something must have changed between 0.17 and main then, because the data sent by a simple GCF metrics test code in 0.17:
(sample code: https://github.com/nielm/counters-test. Note that there is workaround code in index.js, which is enabled by an environment variable ) // RESOURCE_ATTRIBUTES
{
'service.namespace': 'nielm',
'service.name': 'counters-test',
'service.version': '1.0.0',
'cloud.provider': 'gcp',
'cloud.account.id': 'xxxxxxxxxxx',
'cloud.platform': 'gcp_cloud_functions',
'faas.name': 'counters-test',
'faas.version': '2',
'faas.id': '00f46b92856fc7f381fe1f3b1fc26dd38d4da32a74f71a070477a4cefce3bb7ebbfc6db25d45ee01d4020dfd6c32bcd1b26691e8903a33dc5aa1387f12bc2cec6ca5f5',
'cloud.region': 'us-central1'
}
// EXPORTED METRIC, AS SENT TO GCM
{
metric: {
type: 'custom.googleapis.com/nielm-test/background-counter',
labels: {}
},
resource: {
type: 'generic_node',
labels: { location: 'us-central1', namespace: 'nielm', node_id: '' }
},
metricKind: 'CUMULATIVE',
valueType: 'DOUBLE',
points: [
{
value: { doubleValue: 273 },
interval: {
startTime: '2024-05-03T13:07:29.115000000Z',
endTime: '2024-05-03T13:17:45.819000000Z'
}
}
]
} |
Yes there are tons of changes since the last release, apologies for that. Let me just quickly test your sample code at main/HEAD. |
I bundled in a packed tarballs of the packages in this repo, commented out the |
When using OpenTelemetry in Cloud Functions, using the GCM Exporter, a Node ID is not included in the resource attributes.
This leads to errors when 2 or more instances of the same cloud function export metrics to GCM, because: GCM cannot tell the incoming metrics CreateTimeSeries requests from the different instances of the cloud functions apart.
(This same problem will also affect Cloud Run for the same reason)
Errors can be when these 2 (or more) function instances may send CreateTimeSeries requests within 5 seconds of each other, leading to error:
These 2 (or more) functions may send CreateTimeSeries requests with different or overlapping time series for the same metric, leading to error:
Some debugging later...
Cloud functions do have an instance ID that can be used to distinguish between multiple instances of the same function. This is detected and exported by the GcpDetector as the
faas.id
resource attribute (shouldn't it be the FAAS_INSTANCE?)However when creating the TimeSeries, which maps the resource attrs for the timeseries this resource attribute is not used - only Region and function name (not instance) is used.
This means that all CreateTimeSeries requests from multiple instances of the same Cloud Function will be seen as coming from the 'same' instance, and will confict as seen with the above errors.
Also affects Cloud Run and K8s workloads when exporting directly.
As mentioned, this problem will affect Cloud Run for the same reason: no identifiers for the individual instance of a Cloud Run workload are exported
And the same problem also occured in K8s workloads, which was more complicated to resolve. While the Pod name is exported, the pod name is not detected , so I had to manually pass the pod name in to my as an environmental variable, and set the resource manually.
(In a K8s workload using the OpenTelemetry collector, the collector can determine and add the pod name itself, so this only affects workloads who export directly to GCM(.
Suffice to say, understanding these issues, and creating workarounds took a considerable amount of investigation and debugging, as it is not at all well documented.
Workaround for Cloud Functions
I have solved this with a workaround, overriding the resource attributes so that when the TimeSeries is created, different CF instances will be seen as different to GCM.
With this workaround, metrics are reported and aggregated correctly, and no errors occur when sending TimeSeries.
What version of OpenTelemetry are you using?
Latest:
What version of Node are you using?
What did you do?
Simple Cloud Function which exports metrics according to requests, using PeriodicMetricExporter to Google Cloud Messaging, running multiple instances of the same function in parallel.
What did you expect to see?
Using a simple setup of OpenTelemetry with GCM, following the examples, Metrics should be exported reliably without any errors.
What did you see instead?
Multiple errors, metrics failing to be exported, and therefore values missed.
The text was updated successfully, but these errors were encountered: