-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[collectd 6] New plugin (sort of): OpenTelemetry receiver #4271
Conversation
// TODO(octo): convert to gauge instead? | ||
DEBUG("open_telemetry plugin: non-monotonic sums (aka. UpDownCounters) " | ||
"are unsupported"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lot of OTEL metrics are UpDownCounters
:
- https://opentelemetry.io/docs/specs/semconv/system/system-metrics/
- https://opentelemetry.io/docs/specs/semconv/system/hardware-metrics/
And both that and OTEL Counter
type are (listed as) signed, whereas collectd counters are (correctly) unsigned.
PS. apparently e.g. Gauge can be either Int or Double: https://opentelemetry.io/docs/specs/semconv/system/hardware-metrics/#hwbattery---battery-metrics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the impression that OpenTelemetry itself is a bit confused about the role of UpDownCounter:
From the description ofUpDownCounters:
An UpDownCounter is intended for scenarios where the absolute values are not pre-calculated, or fetching the “current value” requires extra effort.
This seems to put emphasis on the cumulative/temporal aspect: I can report that I restocked or sold n items without knowing exactly how many of those items I have in stock. I.e. we compare the value at time t with the value at t-1 and report the difference. For me that implies that only the difference to the previous point (or the rate) is relevant and the absolute value of the cumulative metric is meaningless.
However, from the description of Gauge:
Note: If the values are additive (e.g. the process heap size - it makes sense to report the heap size from multiple processes and sum them up, so we get the total heap usage), use UpDownCounter.
This puts the emphasis on the aggregation aspect: does it make sense to sum up different instances of this metric? Yes for memory usage, no for temperature => memory usage is UpDownCounter, temperature is Gauge. That would imply that the absolute value of an UpDownCounter is meaningful and calculating the rate, e.g. of "used memory" is not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what should be done here, but TODO is fine for this PR. Let's leave this open as a reminder....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking through the chat history of the OpenTelemetry slack, it's clear that UpDownCounters have "gauge like semantics". While I vehemently disagree with the naming of the instrument and the mapping in the wire protocol, I think this ship has sailed and we should just accept that there are two "gauge like" metric types.
Implementation in #4287.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @eero-t!
// TODO(octo): convert to gauge instead? | ||
DEBUG("open_telemetry plugin: non-monotonic sums (aka. UpDownCounters) " | ||
"are unsupported"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the impression that OpenTelemetry itself is a bit confused about the role of UpDownCounter:
From the description ofUpDownCounters:
An UpDownCounter is intended for scenarios where the absolute values are not pre-calculated, or fetching the “current value” requires extra effort.
This seems to put emphasis on the cumulative/temporal aspect: I can report that I restocked or sold n items without knowing exactly how many of those items I have in stock. I.e. we compare the value at time t with the value at t-1 and report the difference. For me that implies that only the difference to the previous point (or the rate) is relevant and the absolute value of the cumulative metric is meaningless.
However, from the description of Gauge:
Note: If the values are additive (e.g. the process heap size - it makes sense to report the heap size from multiple processes and sum them up, so we get the total heap usage), use UpDownCounter.
This puts the emphasis on the aggregation aspect: does it make sense to sum up different instances of this metric? Yes for memory usage, no for temperature => memory usage is UpDownCounter, temperature is Gauge. That would imply that the absolute value of an UpDownCounter is meaningful and calculating the rate, e.g. of "used memory" is not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks OK, just couple of minor notes which could be handled before merging.
…re not in the cache.
…ite_open_telemetry* plugins.
* Set field to `NULL` after freeing. * Remove unused global variable.
This PR adds an gRPC server to collectd, implementing the OpenTelemetry
MetricsService
. This allows collectd to receive metrics via OTLP, for example from the OpenTelemetry Collector.This new code and the write_open_telemetry plugin have been merged into one plugin, called "open_telemetry". Users can enable the exporter, the receiver, or both using the config file, just like they could in the network plugin in collectd 5.
Known limitations:
ChangeLog: OpenTelemetry plugin: This new plugin provides the ability to export as well as receive metrics via OTLP. It supersedes the write_open_telemetry plugin.