-
Notifications
You must be signed in to change notification settings - Fork 878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move client span creation to decorator and automatically suppress creation of neste… #460
Move client span creation to decorator and automatically suppress creation of neste… #460
Conversation
Ignoring all but the first client span for a given trace seems like too radical for me. Even if we go this way, we have to provide a configuration option to disable this suppression and the corresponding documentation. |
Agree it's a big change, and happy to have some configuration if it's still an ok direction. Also still have the idea of returning the current span itself (with a wrapper to prevent closing) if that's a better direction. FWIW, I thought of only adding these checks to transport instrumentation (Apache, Netty, okhttp etc) but thinking about it couldn't come up with a reason why it would ever be better to have multiple client spans. If you could share some use cases, that would help me understand how to make this more general, or not we can just make it a setting, either global or per instrumentation :) |
After more careful reading of the specification I have concluded that the original proposal of @anuraaga to "keep track of client spans and make sure another one is not created as the child of a client span" is the correct one. And this aspect should be addressed uniformly. I have create #465 with my proposal to use Context as a storage mechanism for Server/Client spans. If that proposal gets approval, then only storage should be changed in this PR. "returning a wrapping Span which delegates everything except for end", as proposed in this PR, can then be used for span augmentation as proposed in #465 |
@anuraaga What do you think about introducing Context driven storage in this PR? I then can proceed in changing all other instrumentations to use the same mechanism. Or I just can go forward and do it myself, if you are short on time right now :) |
Thanks - yeah let me take a stab at it, it's a good opportunity to learn even more about the codebase! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a stab, this is still PoC quality but let me know if it's the direction you were thinking. One observation is it seems fragile to make sure all code calls our context-mounting methods instead of the default ones.
// AWS calls are often signed, so we can't add headers without breaking the signature. | ||
if (!awsClientCall) { | ||
final Context context = ClientDecorator.withSpan(span, Context.current()); | ||
// TODO(anuraaga): Seems like a bug that invalid context still gets injected by the injector. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah noticed this is fixed in master
of the SDK :)
Future work of "semantic tracers" should take care of that by extracting much more cross-cutting concerns such as this into 1 place |
Seems good to a quick look. I will review this more thoughtfully later today. |
...src/main/java/io/opentelemetry/auto/bootstrap/instrumentation/decorator/ClientDecorator.java
Show resolved
Hide resolved
...src/main/java/io/opentelemetry/auto/bootstrap/instrumentation/decorator/ClientDecorator.java
Outdated
Show resolved
Hide resolved
...src/main/java/io/opentelemetry/auto/bootstrap/instrumentation/decorator/ClientDecorator.java
Show resolved
Hide resolved
...ain/java8/io/opentelemetry/auto/instrumentation/awssdk/v2_2/TracingExecutionInterceptor.java
Outdated
Show resolved
Hide resolved
I hoped that with client span available in current context, we don't need |
...src/main/java/io/opentelemetry/auto/bootstrap/instrumentation/decorator/ClientDecorator.java
Show resolved
Hide resolved
return TracingContextUtils.withSpan(clientSpan, context); | ||
} | ||
return TracingContextUtils.withSpan( | ||
clientSpan, Context.current().withValue(CONTEXT_CLIENT_SPAN_KEY, clientSpan)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think?
clientSpan, Context.current().withValue(CONTEXT_CLIENT_SPAN_KEY, clientSpan)); | |
clientSpan, context.withValue(CONTEXT_CLIENT_SPAN_KEY, clientSpan)); |
// AWS calls are often signed, so we can't add headers without breaking the signature. | ||
if (!awsClientCall) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing you know something that we don't about this not being needed anymore? 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that there's only the SDK span and no Apache span, there's nothing to propagate here. That being said, it does seem safer to keep the check just in case something changes, at the same time, it seems pretty hacky to have the AWS-specific code here so it feels nice to remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I understand now, right.
Do the aws-sdk
tests protect us here? If so I'm good removing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup the tests cover it. It's how I found this line of code :)
clientSpan, Context.current().withValue(CONTEXT_CLIENT_SPAN_KEY, clientSpan))); | ||
} | ||
|
||
public static Context withSpan(final Span clientSpan, final Context context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this is only called with Context.current()
, so maybe this method isn't needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed the same lack of symmetry in TracingContextUtils
and followed the pattern. Let me know if it's worth diverging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
io.opentelemetry.trace.TracingContextUtils#withSpan
can potentially be called with contexts other than current during context extraction via io.opentelemetry.trace.propagation.HttpTraceContext#extract
method. This client decorator is not general purpose util, but more specific one. Thus I think we indeed can merge these two methods together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a high level comment about how to store this client span into context, let me know if we're on the same page and then I'll proceed to cleanup / testss after that.
...src/main/java/io/opentelemetry/auto/bootstrap/instrumentation/decorator/ClientDecorator.java
Show resolved
Hide resolved
...src/main/java/io/opentelemetry/auto/bootstrap/instrumentation/decorator/ClientDecorator.java
Show resolved
Hide resolved
// AWS calls are often signed, so we can't add headers without breaking the signature. | ||
if (!awsClientCall) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that there's only the SDK span and no Apache span, there's nothing to propagate here. That being said, it does seem safer to keep the check just in case something changes, at the same time, it seems pretty hacky to have the AWS-specific code here so it feels nice to remove it.
...ain/java8/io/opentelemetry/auto/instrumentation/awssdk/v2_2/TracingExecutionInterceptor.java
Outdated
Show resolved
Hide resolved
…-instr-java into suppress-nested-client-spans
@anuraaga do you plan to convert this into proper PR? |
Hi @nikem yeah sorry for the delay but I'm working on cleaning it up to switch it to a PR |
No problem, just checking the status :) |
Fixed the failing test and did some cleanups. Still need to add unit tests for the new methods in ClientDecorator, hope to have it by Monday :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Think this is more or less ready
clientSpan, Context.current().withValue(CONTEXT_CLIENT_SPAN_KEY, clientSpan))); | ||
} | ||
|
||
public static Context withSpan(final Span clientSpan, final Context context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed the same lack of symmetry in TracingContextUtils
and followed the pattern. Let me know if it's worth diverging.
c42a37e
to
9051e50
Compare
Believe the test failure is #480 |
clientSpan, Context.current().withValue(CONTEXT_CLIENT_SPAN_KEY, clientSpan))); | ||
} | ||
|
||
public static Context withSpan(final Span clientSpan, final Context context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
io.opentelemetry.trace.TracingContextUtils#withSpan
can potentially be called with contexts other than current during context extraction via io.opentelemetry.trace.propagation.HttpTraceContext#extract
method. This client decorator is not general purpose util, but more specific one. Thus I think we indeed can merge these two methods together.
if (!clientSpan.getContext().isValid()) { | ||
return TracingContextUtils.withSpan(clientSpan, context); | ||
} | ||
return TracingContextUtils.withSpan( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would extract common call to TracingContextUtils.withSpan
from the if
above.
…d client spans.
This is a PoC for #440. It is common for a client request, such as an RPC, to be composed of multiple parts that are all instrumented, such as the AWS SDK using Apache HTTP client. Currently, we mark the RPC span as internal and the transport span as client, but that breaks features like service graphs which are built around the concept of server/client spans being services, not transport.
Here, I keep track of client spans and make sure another one is not created as the child of a client span. This check would have been trivial if
Span
presented an accessor for its kind, if this idea seems sane maybe we can ask about that. In the meantime, a weak map works.Currently I'm suppressing the span but I also considered returning a wrapping
Span
which delegates everything except forend
. This would allow transports to add information to the client span, for example events for sending / receiving data from the wire. It could happen in the future though since it's just enriching the data, not changing the modeling. It seems even more important if applying this similarly to server-side as it will be common for the "nested span" to have the RPC and similar important information.Note, as the tests show, we lose the feature of modeling retries as spans for the AWS SDK. This seems OK to me, the current code actually doesn't even propagate these spans, the only span that is propagated is the SDK span (found this out while writing this code, propagating an INTERNAL span seems like it goes against the rules of tracing!). I have filed an issue a while ago about the SDK allowing header mutation per retry and it's probably better to wait for that and mark retries as events in the meantime (not implemented yet will do so if this approach is ok to proceed with).
Let me know if this approach seems reasonable at all!