Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move client span creation to decorator and automatically suppress creation of neste… #460

Merged
merged 9 commits into from
Jun 8, 2020

Conversation

anuraaga
Copy link
Contributor

@anuraaga anuraaga commented May 31, 2020

…d client spans.

This is a PoC for #440. It is common for a client request, such as an RPC, to be composed of multiple parts that are all instrumented, such as the AWS SDK using Apache HTTP client. Currently, we mark the RPC span as internal and the transport span as client, but that breaks features like service graphs which are built around the concept of server/client spans being services, not transport.

Here, I keep track of client spans and make sure another one is not created as the child of a client span. This check would have been trivial if Span presented an accessor for its kind, if this idea seems sane maybe we can ask about that. In the meantime, a weak map works.

Currently I'm suppressing the span but I also considered returning a wrapping Span which delegates everything except for end. This would allow transports to add information to the client span, for example events for sending / receiving data from the wire. It could happen in the future though since it's just enriching the data, not changing the modeling. It seems even more important if applying this similarly to server-side as it will be common for the "nested span" to have the RPC and similar important information.

Note, as the tests show, we lose the feature of modeling retries as spans for the AWS SDK. This seems OK to me, the current code actually doesn't even propagate these spans, the only span that is propagated is the SDK span (found this out while writing this code, propagating an INTERNAL span seems like it goes against the rules of tracing!). I have filed an issue a while ago about the SDK allowing header mutation per retry and it's probably better to wait for that and mark retries as events in the meantime (not implemented yet will do so if this approach is ok to proceed with).

Let me know if this approach seems reasonable at all!

@iNikem
Copy link
Contributor

iNikem commented May 31, 2020

Ignoring all but the first client span for a given trace seems like too radical for me. Even if we go this way, we have to provide a configuration option to disable this suppression and the corresponding documentation.

@anuraaga
Copy link
Contributor Author

Agree it's a big change, and happy to have some configuration if it's still an ok direction. Also still have the idea of returning the current span itself (with a wrapper to prevent closing) if that's a better direction.

FWIW, I thought of only adding these checks to transport instrumentation (Apache, Netty, okhttp etc) but thinking about it couldn't come up with a reason why it would ever be better to have multiple client spans. If you could share some use cases, that would help me understand how to make this more general, or not we can just make it a setting, either global or per instrumentation :)

@iNikem
Copy link
Contributor

iNikem commented Jun 1, 2020

After more careful reading of the specification I have concluded that the original proposal of @anuraaga to "keep track of client spans and make sure another one is not created as the child of a client span" is the correct one. And this aspect should be addressed uniformly. I have create #465 with my proposal to use Context as a storage mechanism for Server/Client spans. If that proposal gets approval, then only storage should be changed in this PR.

"returning a wrapping Span which delegates everything except for end", as proposed in this PR, can then be used for span augmentation as proposed in #465

@trask
Copy link
Member

trask commented Jun 1, 2020

@iNikem @anuraaga I like the idea of using the context to store the CLIENT span in this PR 👍.

@iNikem
Copy link
Contributor

iNikem commented Jun 1, 2020

@anuraaga What do you think about introducing Context driven storage in this PR? I then can proceed in changing all other instrumentations to use the same mechanism.

Or I just can go forward and do it myself, if you are short on time right now :)

@anuraaga
Copy link
Contributor Author

anuraaga commented Jun 2, 2020

Thanks - yeah let me take a stab at it, it's a good opportunity to learn even more about the codebase!

Copy link
Contributor Author

@anuraaga anuraaga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a stab, this is still PoC quality but let me know if it's the direction you were thinking. One observation is it seems fragile to make sure all code calls our context-mounting methods instead of the default ones.

// AWS calls are often signed, so we can't add headers without breaking the signature.
if (!awsClientCall) {
final Context context = ClientDecorator.withSpan(span, Context.current());
// TODO(anuraaga): Seems like a bug that invalid context still gets injected by the injector.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah noticed this is fixed in master of the SDK :)

@iNikem
Copy link
Contributor

iNikem commented Jun 2, 2020

Took a stab, this is still PoC quality but let me know if it's the direction you were thinking. One observation is it seems fragile to make sure all code calls our context-mounting methods instead of the default ones.

Future work of "semantic tracers" should take care of that by extracting much more cross-cutting concerns such as this into 1 place

@iNikem
Copy link
Contributor

iNikem commented Jun 2, 2020

Seems good to a quick look. I will review this more thoughtfully later today.

@iNikem
Copy link
Contributor

iNikem commented Jun 2, 2020

I hoped that with client span available in current context, we don't need io.opentelemetry.instrumentation.awssdk.v2_2.TracingExecutionInterceptor#SPAN_ATTRIBUTE anymore. Is this the case?

return TracingContextUtils.withSpan(clientSpan, context);
}
return TracingContextUtils.withSpan(
clientSpan, Context.current().withValue(CONTEXT_CLIENT_SPAN_KEY, clientSpan));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think?

Suggested change
clientSpan, Context.current().withValue(CONTEXT_CLIENT_SPAN_KEY, clientSpan));
clientSpan, context.withValue(CONTEXT_CLIENT_SPAN_KEY, clientSpan));

Comment on lines -183 to -184
// AWS calls are often signed, so we can't add headers without breaking the signature.
if (!awsClientCall) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing you know something that we don't about this not being needed anymore? 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that there's only the SDK span and no Apache span, there's nothing to propagate here. That being said, it does seem safer to keep the check just in case something changes, at the same time, it seems pretty hacky to have the AWS-specific code here so it feels nice to remove it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I understand now, right.

Do the aws-sdk tests protect us here? If so I'm good removing it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup the tests cover it. It's how I found this line of code :)

clientSpan, Context.current().withValue(CONTEXT_CLIENT_SPAN_KEY, clientSpan)));
}

public static Context withSpan(final Span clientSpan, final Context context) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is only called with Context.current(), so maybe this method isn't needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed the same lack of symmetry in TracingContextUtils and followed the pattern. Let me know if it's worth diverging.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

io.opentelemetry.trace.TracingContextUtils#withSpan can potentially be called with contexts other than current during context extraction via io.opentelemetry.trace.propagation.HttpTraceContext#extract method. This client decorator is not general purpose util, but more specific one. Thus I think we indeed can merge these two methods together.

Copy link
Contributor Author

@anuraaga anuraaga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a high level comment about how to store this client span into context, let me know if we're on the same page and then I'll proceed to cleanup / testss after that.

Comment on lines -183 to -184
// AWS calls are often signed, so we can't add headers without breaking the signature.
if (!awsClientCall) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that there's only the SDK span and no Apache span, there's nothing to propagate here. That being said, it does seem safer to keep the check just in case something changes, at the same time, it seems pretty hacky to have the AWS-specific code here so it feels nice to remove it.

@trask
Copy link
Member

trask commented Jun 3, 2020

@anuraaga @iNikem I'm good with this approach. There were already some problems and hacky stuff around how we deal with the aws-sdk / netty interaction. Hopefully we can improve that in the future, but not an issue for this PR.

@iNikem
Copy link
Contributor

iNikem commented Jun 5, 2020

@anuraaga do you plan to convert this into proper PR?

@anuraaga
Copy link
Contributor Author

anuraaga commented Jun 5, 2020

Hi @nikem yeah sorry for the delay but I'm working on cleaning it up to switch it to a PR

@iNikem
Copy link
Contributor

iNikem commented Jun 5, 2020

No problem, just checking the status :)

@anuraaga
Copy link
Contributor Author

anuraaga commented Jun 5, 2020

Fixed the failing test and did some cleanups. Still need to add unit tests for the new methods in ClientDecorator, hope to have it by Monday :)

@anuraaga anuraaga marked this pull request as ready for review June 8, 2020 04:38
Copy link
Contributor Author

@anuraaga anuraaga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Think this is more or less ready

clientSpan, Context.current().withValue(CONTEXT_CLIENT_SPAN_KEY, clientSpan)));
}

public static Context withSpan(final Span clientSpan, final Context context) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed the same lack of symmetry in TracingContextUtils and followed the pattern. Let me know if it's worth diverging.

@anuraaga anuraaga force-pushed the suppress-nested-client-spans branch from c42a37e to 9051e50 Compare June 8, 2020 05:05
@anuraaga
Copy link
Contributor Author

anuraaga commented Jun 8, 2020

Believe the test failure is #480

clientSpan, Context.current().withValue(CONTEXT_CLIENT_SPAN_KEY, clientSpan)));
}

public static Context withSpan(final Span clientSpan, final Context context) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

io.opentelemetry.trace.TracingContextUtils#withSpan can potentially be called with contexts other than current during context extraction via io.opentelemetry.trace.propagation.HttpTraceContext#extract method. This client decorator is not general purpose util, but more specific one. Thus I think we indeed can merge these two methods together.

if (!clientSpan.getContext().isValid()) {
return TracingContextUtils.withSpan(clientSpan, context);
}
return TracingContextUtils.withSpan(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would extract common call to TracingContextUtils.withSpan from the if above.

@trask trask merged commit f13a9c4 into open-telemetry:master Jun 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants