-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add distributed tracing via Zipkin. #354
Conversation
/retest |
@mdemirhan General question: how does Zipkin overlap with Jaeger? Will we install both? |
@grantr Both Zipkin and Jaeger implement OpenTracing (http://opentracing.io/documentation/pages/supported-tracers). However; as far as I understand; OpenTracing doesn't really define the wire format or the backend contract and these different implementations aren't necessarily compatible. That being said, Jaeger is supposed to be Zipkin compatible and we should be able to use Jaeger instead of Zipkin if necessary. Istio supports both Zipkin and Jaeger and we can replace Zipkin with Jaeger if there are features in Jaeger that you prefer over Zipkin. Let me know and I will be happy to take a look. |
@@ -55,32 +61,106 @@ func main() { | |||
flag.Parse() | |||
glog.Info("Telemetry sample started.") | |||
|
|||
// Zipkin setup | |||
reporter := httpreporter.NewReporter("http://zipkin.istio-system.svc.cluster.local:9411/api/v2/spans") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious about the choice to run zipkin
in istio-system
instead of either the monitoring
or ela-system
namespaces?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Istio has a hard-coded URL to zipkin and it assumes that zipkin is installed in istio-system with a service named zipkin. I will open an issue to them and ask them to provide us an override. In the meantime, I will add comments in the code and YAML mentioning that this is a workaround for the istio issue.
|
||
// Add a random delay to simulate some actual work here | ||
time.Sleep(time.Duration(rand.Intn(1000)) * time.Millisecond) | ||
func rootHandler(client *zipkinhttp.Client) http.HandlerFunc { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes seem complicated. I don't think it's immediately obvious which parts of the code are necessary setup vs. demonstrating the tracing capabilities. Can you add detailed comments explaining the various parts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing. I will add extra comments explaining what each section is doing.
True, the individual tracer implementation defines trace propagation. I have no experience with either (though I do have experience with LightStep) but even so, Jaeger seems like a better choice than Zipkin:
|
@grantr All good points. The reason I chose Zipkin over Jaeger was my impression on its higher usage. I couldn't really find a good reference that compares both to make a good informed decision. I will look into Jaeger more and see if we can replace Zipkin. Though, I would like to check this in now and do that investigation next week. Does that sound reasonable? |
Definitely, I didn't mean to stop this PR; was just curious. |
/retest |
I just have a clarifying question about this. It seems to me like we're instrumenting the application to talk directly to a particular tool, and I had been thinking that we should be able to get some of these traces by preserving the headers and not actually requiring the application to have to choose which of these tools is being used. |
That being said, it seems like copying headers manually might actually be a lot more straightforward and I could change the code to just do that. The main disadvantage of that would be the loss of some data (for example, instead of the service name, you will see istio-proxy everywhere). I will give that a shot and see if that looks better. Thanks for bringing this up. |
@vaikas-google Digging deeper into this, we will need this type of instrumentation for any service that is not behind istio proxy and that is not local within our cluster. In a service mesh, if you are calling only local services behind an istio-proxy, this instrumentation is not needed. If you are going to call an external service, or if you are going to call an internal service that is not behind istio-proxy, then this instrumentation is needed. I will add a note to the beginning of the file and to the documentation stating that. Let me know if that sounds reasonable. @mattmoor Let me know if the updated documentation is clear and if I have your approval to check this in. |
# Conflicts: # Gopkg.lock
That's unfortunate especially in the case of functions that are lightweight in code and having them require to manually instrument the code. Thanks for doing the investigation! |
@vaikas-google Agreed. I will look into this further and see if anything can be done to make the situation better. I will capture this investigation as a Github issue and hopefully get to it within the next couple of days. |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, these comments are super helpful to understand what's going on. thanks for so much detail.
I have a handful more comments.
@@ -62,6 +78,18 @@ func main() { | |||
glog.Info("Telemetry sample started.") | |||
|
|||
// Zipkin setup | |||
// If your service only calls services in the local cluster and if those services are | |||
// behind istio-proxy (all revisions are behind istio-proxy), then Zipkin setup below |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "... Elafros Revisions ..." may be clearer
// requests and copy them to your outgoing requests. | ||
// x-request-id, x-b3-traceid, x-b3-spanid, x-b3-parentspanid, x-b3-sampled, x-b3-flags, x-ot-span-context | ||
// | ||
// For richer instrumentation, or to instrument calls that are mad to services not behind |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/mad/made/g
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still need this.
// x-request-id, x-b3-traceid, x-b3-spanid, x-b3-parentspanid, x-b3-sampled, x-b3-flags, x-ot-span-context | ||
// | ||
// For richer instrumentation, or to instrument calls that are mad to services not behind | ||
// istio-proxy, you can instrument the code using Zipkin's Go client library: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trailing colon is weird here.
defer func(start time.Time) { | ||
// Counters only support incrementing. Increment the count by one | ||
// to capture the single call. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any built-in metrics that you get by simply registering the middleware?
One of the nice things about our internal systems is well-defined built-in metrics that just work. :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The go client library adds builtin metrics for golang (memstats, thread, go routines, ...etc) but nothing else. I will create an issue and see if we can simplify this further and add a library that can add some common metrics by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot to mention: There are set of metrics around request time, count & result sent by envoy automatically (whether you instrument your code or not). Those should be good for most use cases. This example here was to show more of a custom metric and how to emit it.
@@ -147,15 +191,23 @@ func rootHandler(client *zipkinhttp.Client) http.HandlerFunc { | |||
} | |||
} | |||
|
|||
// Makes an http call with distributed tracing enabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be a more idiomatic example if this was done by creating an http.Roundtripper
that instruments outbound requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added #374 to track this.
/retest |
// requests and copy them to your outgoing requests. | ||
// x-request-id, x-b3-traceid, x-b3-spanid, x-b3-parentspanid, x-b3-sampled, x-b3-flags, x-ot-span-context | ||
// | ||
// For richer instrumentation, or to instrument calls that are mad to services not behind |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still need this.
/retest |
No description provided.