-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ref(otel): Add ClientOptions.Instrumenter
#679
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #679 +/- ##
==========================================
+ Coverage 80.89% 81.20% +0.30%
==========================================
Files 48 48
Lines 4717 4730 +13
==========================================
+ Hits 3816 3841 +25
+ Misses 763 754 -9
+ Partials 138 135 -3 ☔ View full report in Codecov by Sentry. |
The windows test failure seems unrelated? Maybe flaky? As for the coverage decrease, I could add some tests for this case if the general approach seems reasonable. |
Yep, Windows tests are sadly a bit flaky 🙁 |
Sorry for the delay; vacation got in the way 😉 My javascript is pretty rusty (no 🦀 pun intended), but AFAICT the js integration works slightly differently. It always uses an existing span context (similarly to the proposed change in this PR), but it also has an extra The extra option seems a bit error-prone, compared to the sample-func approach in this PR: the code has to remember to check against known values, instead of explicitly passing exactly what we want it to do. But that's just my subjective (and superficial!) take on it. |
db4a1ea
to
8e878cd
Compare
I've reworked the PR with a slightly different solution. Now instead of trying to use a sampler, we address the issue at the middleware level. The problem - as I curently understand it - is related to middleware call-ordering: we want otel to be called first, to respect potential sampling decisions from a remote caller. That sampling decision should be passed to the sentry world, which was already done in the otel span processor, but was not working reliably (fixed in this PR). After the otel middleware, we still want to use the sentry middleware (at least for event tracking and recovery), but at that point, the existing sentry span created in the span processor above is not in the context, so the sentry middleware happily creates a new one (and without a parent). Does that make sense? |
1736444
to
9b18d57
Compare
Ok, third time is a charm! The PTAL |
@cleptric can you please allow the tests to run? 🙏 |
9b18d57
to
af3cd83
Compare
FYI: 🔁 rebased on master, please unlock tests 🙏 |
⚠️ this is only to allow using the forked version while getsentry#679 is open
Sorry that it took us a while to get back to you. When using Otel, you shouldn't use any Sentry performance-related features. This is why we added the So instead of using Sentry's Our propagator considers incoming trace headers, so distributed tracing also works. |
So you mean skipping the http middleware? But then we'd also throw out error/panic tracking? If that's the intended way to work with OTEL, then maybe I should change this PR to instead add an option to |
This is a good point about panic tracking I haven't thought about. IMO, we could wrap this code https://github.com/getsentry/sentry-go/blob/master/http/sentryhttp.go#L97-L112 into an if condition that checks for This might make it a bit easier than adding a new integration option. |
@cleptric you mean disabling We could maybe refactor the sampling logic to respect sample decisions even if In the end we want It seems to me we have two options:
|
Ok, we might need to add the |
I think adding the |
af3cd83
to
98709e9
Compare
@cleptric @greywolve now this PR has an attempted implementation of the new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments. This also needs tests.
client.go
Outdated
@@ -133,6 +133,9 @@ type ClientOptions struct { | |||
TracesSampleRate float64 | |||
// Used to customize the sampling of traces, overrides TracesSampleRate. | |||
TracesSampler TracesSampler | |||
// Which instrumentation to use for tracing. Either "sentry" or "otel" are supported. | |||
// Setting this to "otel" will ignore TracesSampleRate and TracesSampler and assume sampling is performed by otel. | |||
TracesInstrumenter string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should call it Instrumenter
as in other SDKs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing! Done! PTAL
client.go
Outdated
switch options.TracesInstrumenter { | ||
case "": | ||
options.TracesInstrumenter = "sentry" | ||
case "sentry": | ||
// noop | ||
case "otel": | ||
// sampling is performed by the OpenTelemetry SDK | ||
options.TracesSampleRate = 1.0 | ||
options.TracesSampler = nil | ||
default: | ||
return nil, fmt.Errorf("invalid value for TracesInstrumenter (supported are 'sentry' and 'otel'): %q", options.TracesInstrumenter) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we should do this. I would rather have people define this explicitly in their options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean the TracesSampleRate
and TracesSampler
? Should we then error if the values are set to anything else? Doing neither seems a bit error-prone: Instrumenter: "otel"
+ TracesSampleRate: 0
(default) would ignore otel sampling decisions and never send traces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure it does this? The span processor actually takes the Otel sampling decision into account, taking priority over the traces sample rate and traces sampler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe so. That might be a slightly different issue, though. One I had addressed in a previous commit.
The current span processor considers only the sampled decision of the parent, so it only actually respects it if the span is part of an incoming trace. If it's a new trace, without a parent, the otel span might be sampled, but the sentry span remains at SampleUndefined
, which then means we fall back to the client-wide sample rate.
The current test doesn't catch it because both the sentry client and the otel stack are initialized with "always trace", so they both arrive at the same decision, even though the processor isn't doing anything.
I just added a commit with the fixed test and the fixed code. PTAL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
friendly ping 😇
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take a look later this week.
We made similar changes for node.js in getsentry/sentry-javascript#9203, but I need to dive into this whole topic a bit deeper to give you actionable feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
friendly ping 😇 (I know these things get lower prio; just wanna avoid it getting forgotten)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, sorry for that! I had to get some stuff shipped over on the PHP side of things, but the next few weeks will be again Go focus time for me.
ClientOptions.Instrumenter
a775255
to
0c4ac8e
Compare
0c4ac8e
to
b48adb9
Compare
(nevermind, missed the propagator code, sorry for the noise 🙇) |
We actually extract the Sentry trace header, see https://github.com/getsentry/sentry-go/blob/master/otel/propagator.go#L77-L95 |
@cleptric ah, you're right of course. Appologies for the noise; disregard the last comment. |
b9e108b
to
d8a5714
Compare
LGTM, but I'll let @cleptric give the final thumbs up :) |
27db82c
to
d389e8a
Compare
hi! |
d389e8a
to
befd5e5
Compare
maybe I can get this PR as a christmas present? 😇 🎅 🎄 (just rebased 🔁) |
befd5e5
to
8249502
Compare
Rebased again. Any chance a paying customer can get some feedback? Even if this isn't the right solution? |
It's still on our radar! |
@cleptric customer has requested an update on this PR. Is there any? |
2e505ec
to
49e1741
Compare
This is a fork based on this branch: getsentry#679
49e1741
to
92bb309
Compare
This is a fork based on this branch: getsentry#679
This enables instrumentation code (sentryhttp, sentrygin, etc) to skip trace instrumentation while still performing panic recovery.
Co-authored-by: Oliver Powell <[email protected]>
92bb309
to
394d48e
Compare
@cleptric any updates for this? |
This small PR is a first attempt to fix the perceived issue in #678.
It extends
sentry.ClientOptions
with theInstrumenter
parameter, analogous to other language SDKs. This can be optionally set to "otel" or "sentry" (the default).This new option is then respected by middlewares to optionally skip creating a sentry transaction, allowing the consumer to only use "pure" otel for traces.
Considered alternatives:
Another option would be making the transaction created by the otel span processor visible to the middlewares. However, since the span processor cannot easily modify the request context, it becomes necessary to add additional helpers (e.g. a new "adapter" middleware) with the single task of injecting the otel-created transaction into the request context.
Fixes: #678 #788