-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[receiver/statsd] Add self-telemetry #31839
Conversation
} | ||
} | ||
|
||
func (r *reporter) RecordFlushedMetrics(count int64, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again compared to the prometheusreceiver, i think the flushed metrics should be the receiver accepted/refused metrics. The parser should be the place where the instrumentation for received and parse error counts are added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. I'm not sure about this one. In our monitoring, we are comparing the received datapoint count to the sent metrics count from statsd clients. See https://github.com/DataDog/datadog-go/blob/9d920252c04efe4e5f4d3a283002b72253a7da47/statsd/telemetry.go#L271
Is prometheus a valid comparison? In prometheus there is no aggregation in the receiver itself so there's only one measure for a "received metric", right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK with flipping the logic if that's the consensus. The naming becomes a bit harder though 🙂
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
@TylerHelmuth would you please review this? I stopped work on #31822 in favor of this. |
// for the metric naming conventions | ||
|
||
r.acceptedMetricPoints, err = metadata.Meter(set.TelemetrySettings).Int64Counter( | ||
"receiver/accepted_metric_points", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this receiver not use receiverhelper? If it is it should be producing this metric already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm with @TylerHelmuth here. I've read the PR description but still don't understand why we cannot use receiverhelper.NewObsReport
as other receivers. Excessive tracing is a concern for any receiver, so collector's tracing is expected to be sampled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this receiver not use receiverhelper? If it is it should be producing this metric already.
It does not use receiverhelper
. It previously used the obsreport
package, but that was dead code as mentioned in the issue #24278 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't understand why we cannot use receiverhelper.NewObsReport as other receivers.
It's messy to use obsreport
because the asynchronous nature of this receiver (the flush happening in the background) does not align with the StartMetricsOp()
/ EndMetricsOp()
pattern of obsreport. This is likely why the obsreport integration was abandoned in #1670 when aggregation / async flush was implemented in this receiver.
Excessive tracing is a concern for any receiver, so collector's tracing is expected to be sampled.
I'm more concerned with the performance overhead of the tracing instrumentation code than I am the volume of traces. Since submitting this PR we actually had to rewrite this instrumentation because even the overhead of incrementing the metric counter synchronously was causing statsd packet loss.
You can see in the attached flame graph that simply calling metric.(*int64Inst).Add()
is significant overhead (12% of overall CPU).
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
Description:
Add telemetry to statsdreceiver
otelcol_receiver_accepted_metric_points
otelcol_receiver_refused_metric_points
otelcol_receiver_statsd_flushed_metric_points
otelcol_receiver_statsd_flushes
Implementation note
I debated trying to use the existing
receiverhelper.ObsReport
for this but rejected that approach for a few reasonsSpan
s getting created for every single statsd line that's sent through the system.flush
esLink to tracking Issue: #24278
Testing:
Tested locally using prometheus