Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate OpenTracing #426

Merged
merged 5 commits into from
Aug 5, 2016
Merged

Integrate OpenTracing #426

merged 5 commits into from
Aug 5, 2016

Conversation

yurishkuro
Copy link
Contributor

@yurishkuro yurishkuro commented Jun 24, 2016

Ready for review

Changes:

  • internal span reporting completely removed and replaced with OpenTracing
  • span context and baggage are propagated via application headers (json only atm, thrift next)
  • channel can be configured with opentracing.Tracer, otherwise it defaults to opentracing.GlobalTracer() (which is no-op by default)
  • outbound calls tracing
    • span is started in (c *Connection) beginCall
    • if tracer understands "zipkin" format, then the tracing fields are populated in callReq
    • the actual span/baggage info is injected in json/call.go
    • span is stored in Response and finished in a couple of places in outbound.go
  • inbound requests tracing
    • if tracer understands "zipkin" format, the span is started in inbound.go and baggage is added in the encoding handler
    • otherwise span is started in the encoding handler only
    • finished generically in completion callbacks in inbound.go

TODO:

  • avoid duplicating trace IDs in the headers and in the frame
  • support thrift
  • support raw
  • support for disabling tracing by instructing Tracer to not sample spans
  • generate random IDs for native Span if tracer is not zipkin-compatible

@HelloGrayson HelloGrayson changed the title Integrate opentracing WIP - Integrate opentracing Jun 24, 2016
@yurishkuro
Copy link
Contributor Author

@prashantv @uber/rpc

@prashantv
Copy link
Contributor

@yurishkuro Looking over the changes now, some high level feedback:

  • I think inbound.go should initially create the span and set the span in the context. I think the subpackages are responsible for possibly adding baggage to the span from the headers.
  • Spans seem like they are mutable, does this mean that anyone who accesses the context could modify tags/baggage?
  • It seems like the tracingDisabled bit uses Flags in opentracing, but I don't see anything about Flags in the opentracing godoc, https://godoc.org/github.com/opentracing/opentracing-go#Span

I think the outbound side looks good, just a little unsure about inbound. If the TChannel library is used without any of the json or thrift packages, will the context still be created with the opentracing span and be propagated correctly? I would expect everything except tags/baggage to work in that case.

@yurishkuro
Copy link
Contributor Author

I think inbound.go should initially create the span and set the span in the context.

The problem with that is it will make the solution really tied to Zipkin-style tracing systems, or at least those who support injection carrier matching tchannel.Span (which Jaeger does). In the current implementation, the outgoing span is injected into app headers using standard OT API, not Zipkin-specific, but if the tracer happens to support Zipkin style, then the trace IDs are also added to the frame tracing fields, as a bonus. So the tracing will still work with non-Zipkin tracers.

On the inbound side, if we start the span before app headers are known, we'd have to start it from the frame tracing fields, i.e. only in Zipkin-specific way. Whereas the span created in json/handler with work with any tracer.

One dirty way would be to start a span in inbound.go with zipkin-specific API, but then if the app headers are available then start a new one and replace the first one. In Jaeger starting and not finishing a span is not a problem, it just gets garbage collected, may not be the case with other tracers.

Spans seem like they are mutable, does this mean that anyone who accesses the context could modify tags/baggage?

Yes, which is in fact the intention, the user code may add other tags/events/baggage to the span.

It seems like the tracingDisabled bit uses Flags in opentracing, but I don't see anything about Flags in the opentracing godoc, https://godoc.org/github.com/opentracing/opentracing-go#Span

Correct, there are no "flags" in OpenTracing, so the way these are exposed to tchannel.Span is via above-mentioned Zipkin-specific API. There's also no explicit API to disable tracing, but there is a way to force the sampling on or off, which is what I used.

@yurishkuro
Copy link
Contributor Author

yurishkuro commented Jun 28, 2016

@prashantv I think this current form does what you want. If the tracer supports zipkin-style span context, then the span is created directly in the inbound.go, and once the headers are read the baggage is added to that span (currently by creating a new span and discarding it, but with the upcoming PR #82 in opentracing-go I will be able to do that without starting a span). If the tracer does not support zipkin-style, then the span is only created in json/handler.go via standard OpenTracing API.

I've extended the propagation test to use 3 tracers, the default Noop tracer, Jaeger tracer, and basictracer-go (which does not support zipkin style).

@prashantv
Copy link
Contributor

@yurishkuro Nice! Thanks for making the changes, I definitely like this approach more.

  • Should we export Span directly on the response? Can we make it a method (there's some users of TChannel that have built their own interface to match the calls, and that's only possible because we use methods for exporting any data)
  • I want to make sure that disabling of a trace works across all tracers
  • If the Jaeger tracer is used, then the tracing information is propagated via TChannel's fixed fields -- is this information duplicated in headers? Or is it only in the fixed fields?
  • Little worried about Span being nil in a few places, is it safer to use an empty span in those cases? Want to make sure that we don't have nil dereference panics, and instead return some zero value instead

I'm assuming that all opencontext headers are added with a prefix even in TChannel's headers so they won't clash right?

Should we put the code to extract/inject tracing into some internal package that can be used by Thrift + JSON.

@yurishkuro
Copy link
Contributor Author

I want to make sure that disabling of a trace works across all tracers

We can do it in two different ways. What I've done currently is disabling the sampling of the current trace (or rather span and its children). OT tracers are not obligated to respect the sampling hints. Another approach would be to completely disable all calls to OpenTracing API if disableTracing is requested.

If the Jaeger tracer is used, then the tracing information is propagated via TChannel's fixed fields -- is this information duplicated in headers? Or is it only in the fixed fields?

It is duplicated. The headers are read/set via standard OpenTracing API. If we really want to avoid duplication, I can hack it via yet another special Format, like BaggageOnlyTextMapFormat, which will only be known to Jaeger and TChannel, and if the tracer says that format is not supported, then fallback to full span context headers

Little worried about Span being nil in a few places, is it safer to use an empty span in those cases? Want to make sure that we don't have nil dereference panics, and instead return some zero value instead

I can use opentracing.Noop span, by default, but will need another flag to distinguish default from actual span, because json/handler.go branches depending on whether the span was created in inbound.go

I'm assuming that all opencontext headers are added with a prefix even in TChannel's headers so they won't clash right?

Normally the headers returned by a Tracer are prefixed with tracer-specific strings. E.g. Jaeger's headers are either uber-trace-id or uberctx-***. Given that tchannel has special handing of the headers, it's possible to further prefix those, but I am not sure it's necessary.

@yurishkuro yurishkuro changed the title WIP - Integrate opentracing Integrate opentracing Jul 2, 2016
@yurishkuro
Copy link
Contributor Author

@prashantv @breerly this is ready for review

@[ ! -s lint.log ]
else
@echo "Skipping linters on" $(GO_VERSION)
endif

todos:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest moving this to diff PR

"golang.org/x/net/context"

"github.com/opentracing/opentracing-go/ext"
"github.com/uber/tchannel-go/typed"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import groups are off.

@yurishkuro yurishkuro changed the title Integrate opentracing Integrate OpenTracing Jul 7, 2016
@@ -76,9 +76,7 @@ func getTChannelParams(ctx context.Context) *tchannelCtxParams {

// NewContext returns a new root context used to make TChannel requests.
func NewContext(timeout time.Duration) (context.Context, context.CancelFunc) {
return NewContextBuilder(timeout).
setSpan(NewRootSpan()).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you make a new context, is there any way to know what the trace ID will be for the call you're about to make?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, trace ID is not a concept that's exposed in OpenTracing API, so strictly speaking - no.

In practice, if there is already an OpenTracing span in the current context, then a child span will be created when the outbound call is made (but not when the context is created). If the tracer is Zipkin-compatible, then technically it's possible to extract trace ID from the current span, and internally the outbound call will do that and store that trace ID in the frame's Trace fields.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One use case we have is yab which makes a call from an outbound perspective, but then outputs the traceID (so you can find the trace later). It uses the CurrentSpan(ctx) after creating a context, so I wonder if every new context should include a new root span. While yab can be updated, I don't want to break any other uses of this pattern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main issue I had with Context doing anything related to spans is that spans can only be created by the Tracer. Passing the tracer to Context methods seemed very wrong.

I think we should revisit the use case of yab. What you describe sounds like a debugging feature, which is much better handled by configuring the Tracer with a logging reporter (alone or in addition to real span reporter). For example, in Jaeger this is achieved by setting logSpans: true in the config (or programmatically).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it is a debugging feature, the TChannel library should allow these use cases. E.g. yab should have control over how the trace ID is outputted. Passing tracer does seem pretty wrong, would setting a parent span (that the user creates) be a better solution?

There also seems internal users using CurrentSpan(ctx), not sure how they're using it or whether they'll be affected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB: if returning nil from CurrentSpan is a concern, we could return a pointer to a default/empty Span{} with 0 IDs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I should have been clearer. Returning nil isn't the issue, it's more that there's existing internal callers to CurrentCall that might be depending on the trace ID being set as they're about to make an outbound call.

I'll sync up over email about what existing callers are and we can figure out whether they need it to work or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In YARPC, we should guarantee that there is a span associated with a response context. We do need tracing to expose an arbitrary string for applications like what @prashantv mentions for tools and logging. Something you can use to associate and search for the trace. It does not have to conform to the Zipkin scheme.

Copy link
Contributor Author

@yurishkuro yurishkuro Jul 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kriskowal OpenTracing cannot impose any specific restrictions on how a "trace identity" is represented in a string, because it's an implementation detail of the tracer and they vary greatly. However, it is reasonable to require that span.String() returns something that can be used to locate the trace in the respective UI. It's just that this "something" will be implementation specific. For example, Jaeger spans return a string like this: 73754768284bbd0f:73754768284bbd0f:0:0. Note that the string will be unique for each span, but the first segment is a common trace id (implementation detail).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be satisfying for spans to be Stringers without any requirements for the form of the string. This would reinforce the need for YARPC to return a context. We need a mechanism for YARPC to communicate back the span it created on your behalf. cc @bombela

@yurishkuro
Copy link
Contributor Author

@prashantv any other comments?

@kriskowal
Copy link
Contributor

lgtm. please await stamp from @prashantv.

// Tracer returns the OpenTracing Tracer for this channel.
// If no tracer was provided in the configuration, returns opentracing.GlobalTracer().
func (ccc channelConnectionCommon) Tracer() opentracing.Tracer {
if ccc.tracer != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we only need to do this check once when creating the channel rather than everytime Tracer is called?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it as late binding, in case the global tracer is set after the channel is created

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't understand -- you can't change ccc.tracer after the channel is created right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct, but opentracing.GlobalTracer() can change (by default it returns Noop tracer, not a real one).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, makes sense. Can you update the comment to call that out, so we don't accidentally "optimize" this and break it. (ideally a unit test too).

}

// PropagationTestSuite is a collection of test cases for a certain encoding
type PropagationTestSuite struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test is great, nice work!

@prashantv
Copy link
Contributor

Thanks for the benchmark numbers, good to see no regressions. Can you rebase off the latest dev as well?

@yurishkuro yurishkuro force-pushed the integrate-opentracing branch from ec8fb3b to 5cc3ee0 Compare July 22, 2016 00:01
@yurishkuro
Copy link
Contributor Author

rebased off dev

@yurishkuro yurishkuro force-pushed the integrate-opentracing branch from a1863bc to 7cb6e16 Compare July 22, 2016 14:51
return true
})
}
carrier.RemoveTracingKeys()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prashantv I've added the namespacing feature for tracing keys by using a custom tracingHeadersCarrier which uses a prefix for all tracing keys. The RemoveTracingKeys() call restores the application headers to a pre-injection state.

The prefixed and unprefixed keys are cached.

@yurishkuro yurishkuro force-pushed the integrate-opentracing branch from 5dc127f to 2b55b7d Compare July 30, 2016 22:12
@yurishkuro yurishkuro force-pushed the integrate-opentracing branch from 2b55b7d to fcca900 Compare July 30, 2016 22:35
// tracingKeyPrefix is used to prefix all keys used by the OpenTracing Tracer to represent
// its trace context and baggage. The prefixing is done in order to distinguish tracing
// headers from the actual application headers and to hide the former from the user code.
const tracingKeyPrefix = "$tracing$"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prashantv ok to use this prefix?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep fine with me, @kriskowal @abhinav are you OK with this as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. @kriskowal?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🆗

@prashantv
Copy link
Contributor

Code lgtm, there's a few minor comments left to address -- can you address those and then we can merge.

defer func() {
// restore it on exit
opentracing.InitGlobalTracer(origTracer)
}()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can simplify this to:

defer opentracing.InitGlobalTracer(origTracer)

@prashantv prashantv merged commit 114c2b7 into dev Aug 5, 2016
@prashantv prashantv deleted the integrate-opentracing branch August 5, 2016 23:09
@prashantv prashantv mentioned this pull request Aug 25, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants