-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Static mode tracing: Generate spanmetrics after load balancing. #5889
Conversation
c19e190
to
65d45d7
Compare
@@ -978,6 +980,7 @@ func orderProcessors(processors []string, splitPipelines bool) [][]string { | |||
if processor == "batch" || | |||
processor == "tail_sampling" || | |||
processor == "automatic_logging" || | |||
processor == "spanmetrics" || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, this is the main bug fix, so that if we encounter the spanmetrics processor, we order it accordingly, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that's the main bugfix.
if we encounter the spanmetrics processor, we order it accordingly, right?
I suppose your question is for Flow, since static mode automatically orders them? In Flow, if there are two Agents which ingest traces in a load balanced way, they need to be behind a load balancing exporter for spanmetrics to work ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell this looks good, nice catch! Feel free to merge this!
It'd be kinda harder in Flow mode, but do you think similar warning(s) would make sense there? If so, let's just open a tracking issue so we don't forget about it.
It'd be hard to put such warnings in Flow mode, because we can't assume much about what the users want to do. Static mode is more restrictive, so it makes a bit more sense here. But even in static mode these warning don't always make sense, because the Agent doesn't know if it's running in a cluster or not. |
Co-authored-by: Paschalis Tsilias <[email protected]>
PR Description
As the OTel documentation for the loadbalancer exporter states, a
service
routing key must be used when using both load balancing and spanmetrics:Currently, static mode will generate spanmetrics before it has even done load balancing. This PR changes it so that spanmetrics are generated after loadbalancing.
In the code don't check whether the customer actually used the correct routing key of
service
when using spanmetrics and LB. Similarly, we don't check whether the routing key oftraceID
is used when using tail sampling.Unfortunately this seems to be a very longstanding bug. It was apparently first introduced in #616 on May 26, 2021. It was merged shortly after spanmetrics was aded in the Agent via #499 on April 5, 2021. At the time the loadbalancing implications of using spanmetrics were probably not well understood.
I think users affected by this bug would probably see errors when remote writing metrics to Mimir, because different Agent might try to remote write the same series. So that's a "good" thing about this bug - if it causes a real problem, hopefully affected users have already seen such errors and acted on them.
I also tested this locally using configuration like this:
Agent config
When running my local Agent I could see the spanmetrics on
http://localhost:8898/metrics
, and I could see that metrics are being received and sent to Tempo via the metrics onhttp://localhost:12345/metrics
.PR Checklist