Add Distributed Tracing using OpenTelemetry #12460

dashpole · 2020-11-06T22:04:23Z

Prior issues/PRs related to distributed tracing: #11166, #9242, #5425.

In the kubernetes instrumentation sig, we are adding working to add Distributed Tracing support using OpenTelemetry to the kube-apiserver: kubernetes/enhancements/keps/sig-instrumentation/647-apiserver-tracing. We plan to have Alpha (disabled by default) support for this feature in kubernetes v1.22. Progress can be tracked in kubernetes/enhancements#647. This will generate spans for incoming and outgoing requests, including requests to etcd. It will also propagate the w3c trace context and w3c baggage along with outgoing requests.

This presents an opportunity to make Etcd easier to debug when using kubernetes. Since distributed tracing provides a unified view of a single request across multiple components, a trace including the API Server and Etcd would show both the API call to the APIServer and the resulting Etcd call, providing better context on the call to Etcd. Based on #5425 (comment), there is already interest in distributed tracing, so I won't elaborate further.

Previous attempts at distributed tracing were looking into OpenTracing, OpenCensus, and jaeger formats and libraries. Thankfully, these have since unified behind the OpenTelemetry project, which makes a potential evaluation considerably easier. OpenTelemetry will be GA soon, so now seems like the right time to open a discussion around adding distributed tracing to Etcd.

ccing people who have been involved with or interested in this topic.
@logicalhan @tedsuo @ehashman @brancz @bg451 @bhs @gyuho @jingyih @serathius

logicalhan · 2020-11-06T23:03:23Z

/cc @ptabor @wenjiaswe

hexfusion · 2020-11-06T23:48:28Z

/cc @ironcladlou @retroflexer @marun

jingyih · 2020-11-09T15:29:48Z

Today in etcd, tracing is implemented in a similar fashion as in kubernetes (https://github.com/kubernetes/utils/tree/master/trace).

/cc original author for etcd tracing: @YoyinZyc

stale · 2021-03-09T18:43:43Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

lilic · 2021-04-22T09:04:22Z

Hey, @dashpole 👋 I am always on board when it comes to adding more observability signals into any project and tracing is something that is very useful! With that being said, I have two concerns with adding OpenTelemetry:

The possible resource usage increase, was this resolved already? What would be the resource or performance addition if OpenTel tracing is added?
OpenTelemetry format is not GA yet according to the go client lib https://github.com/open-telemetry/opentelemetry-go#project-status. I know a couple of pieces went GA, do we know when the format goes GA?

Warning: this project is currently in a pre-GA phase. Backwards incompatible changes may be introduced in subsequent minor version releases as we work to track the evolving OpenTelemetry specification and user feedback.

Thank you!

dashpole · 2021-04-22T15:34:20Z

For latency of creating a span, the numbers are currently within 0.01 milliseconds:

$ go test ./... -bench .
goos: linux
goarch: amd64
pkg: go.opentelemetry.io/otel/sdk/trace
cpu: Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
BenchmarkAttributesMapToKeyValue-6     	  237327	      5009 ns/op
BenchmarkStartEndSpan/AlwaysSample-6   	  883227	      1757 ns/op	     848 B/op	       9 allocs/op
BenchmarkStartEndSpan/NeverSample-6    	  738933	      1484 ns/op	     752 B/op	       8 allocs/op
BenchmarkSpanWithAttributes_4/AlwaysSample-6         	  429614	      2916 ns/op	    1584 B/op	      17 allocs/op
BenchmarkSpanWithAttributes_4/NeverSample-6          	  794530	      1721 ns/op	     944 B/op	       9 allocs/op
BenchmarkSpanWithAttributes_8/AlwaysSample-6         	  296242	      4191 ns/op	    2112 B/op	      23 allocs/op
BenchmarkSpanWithAttributes_8/NeverSample-6          	  773558	      1877 ns/op	    1136 B/op	       9 allocs/op
BenchmarkSpanWithAttributes_all/AlwaysSample-6       	  333826	      3792 ns/op	    1936 B/op	      21 allocs/op
BenchmarkSpanWithAttributes_all/NeverSample-6        	  689450	      1785 ns/op	    1072 B/op	       9 allocs/op
BenchmarkSpanWithAttributes_all_2x/AlwaysSample-6    	  183164	      6435 ns/op	    3236 B/op	      32 allocs/op
BenchmarkSpanWithAttributes_all_2x/NeverSample-6     	  612986	      2073 ns/op	    1392 B/op	       9 allocs/op
BenchmarkTraceID_DotString-6                         	 8888558	       145.3 ns/op
BenchmarkSpanID_DotString-6                          	11427596	       111.4 ns/op
PASS
ok  	go.opentelemetry.io/otel/sdk/trace	17.503s

I'll ask about memory usage and get back. ~~I recall there being a small, fixed-size buffer, but i'll confirm.~~

The first RC is expected to be cut in 2-3 weeks, and 1.0 for the go client is a prerequisite for beta in kubernetes. I'd expect a 1.0 in 2 months or so, to be conservative. But it's OSS, so anything can happen.

lilic · 2021-04-27T14:44:21Z

@dashpole Thank you for that! I think the value of having one trace context propagated from apiserver to etcd would be amazing for insight!

But code freeze is mid-May for etcd #12330 (comment), so I am wondering if it's possible we even will be able to make that deadline, given the stability of the OpenTel 1.0 go client to be 2 weeks or so. Happy to help out drive this along!

logicalhan · 2021-04-27T14:48:37Z

@dashpole Thank you for that! I think the value of having one trace context propagated from apiserver to etcd would be amazing for insight!

I suspect a lot of requests will be served directly from the watch-cache, so we probably shouldn't be expecting all requests to trace back to etcd. Paginated lists will, watches won't and I can't remember if we are serving strongly consistent reads from the cache (there was a KEP proposing it at one point).

logicalhan · 2021-04-27T14:52:18Z

I think we should definitely still do it, I'm just trying to temper expectations is all..

dashpole · 2021-04-27T14:53:45Z

Following-up from the earlier comment, there isn't a fixed-size buffer. It looks like this is a case where we will just have to try it to find out how much cpu/memory it takes...

The timing would probably be too tight to make it into the next etcd release.

lilic · 2021-04-27T14:58:03Z

@logicalhan makes sense thank you for that yes!

Following-up from the earlier comment, there isn't a fixed-size buffer. It looks like this is a case where we will just have to try it to find out how much cpu/memory it takes...

@dashpole Thanks! Will try to play around a bit and do a POC locally to see where the cpu/memory is at now, I know it was fairly high in the past, but lets see the current state and I will report back.

stale · 2021-08-09T01:14:26Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

serathius · 2022-03-10T16:19:52Z

Reopening to track graduation of distributed tracing flags as proposed in #13775

houseme · 2022-03-16T02:30:43Z

Can I upgrade to the latest version? open-telemetry/opentelemetry-go#2676

baryluk · 2022-04-22T13:49:48Z

Hi. We had some long apply times in our etcd cluster used by OKD, and enabled tracing to help with debugging this.

I was able to setup it with otelcol and forwarding to tempo and visualizing it in Grafana. I was not able to use jaeger because they do not provide OTEL enabled binaries. (it can be done with otelcol and jaeger exporer plugin from contrib for otelcol, but there are no precompiled binaries for that eitehr).

We checked few traces, and there are few missing elements for it to be really usable.

For RPCs received and sent. Only grpc service and method names are included, but no details about request and response. Only status and uncompressed_sizes info.

Here are some events as logged by otelcol with logging exporter:

Span #282
    Trace ID       : 32a380edb1ccd49e3b4fa8f3fbd329d2
    Parent ID      : 
    ID             : 5d6d8aa231ff85aa
    Name           : etcdserverpb.KV/Range
    Kind           : SPAN_KIND_SERVER
    Start time     : 2022-04-22 13:45:36.11019019 +0000 UTC
    End time       : 2022-04-22 13:45:36.122102025 +0000 UTC
    Status code    : STATUS_CODE_OK
    Status message : 
Attributes:
     -> rpc.system: STRING(grpc)
     -> rpc.service: STRING(etcdserverpb.KV)
     -> rpc.method: STRING(Range)
     -> net.peer.ip: STRING(10.10.0.183)
     -> net.peer.port: STRING(55550)
     -> rpc.grpc.status_code: INT(0)
Events:
SpanEvent #0
     -> Name: message
     -> Timestamp: 2022-04-22 13:45:36.110201481 +0000 UTC
     -> DroppedAttributesCount: 0
     -> Attributes:
         -> message.type: STRING(RECEIVED)
         -> message.id: INT(1)
         -> message.uncompressed_size: INT(56)
SpanEvent #1
     -> Name: message
     -> Timestamp: 2022-04-22 13:45:36.122097256 +0000 UTC
     -> DroppedAttributesCount: 0
     -> Attributes:
         -> message.type: STRING(SENT)
         -> message.id: INT(1)
         -> message.uncompressed_size: INT(19289955)

I believe it would be good to include more grpc data that in spans. For large requests, truncate some big fields if needed (i.e. body content on create / get response / put request, or list of all keys when doing a list response), but other fields should be included (most notably the key in get / put / watch requests).

I checked https://pkg.go.dev/go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc and there is no transparent way of doing this automatically as far as I can see, but maybe adding few manual attributes (i.e. key / key range) in etcd start of the method implementation, could do a trick.

It also looks that only grpc public api server is intercepted and instrumented, but grpc clients, and the internal (between replicas) grpc server is not instrumented, again reducing usability, especially for things like request forwarding from non-master to master (leader), and proposals sending (i.e. for Put and Txn) to other cluster members.

jmhbnz · 2024-06-20T18:19:49Z

Discussed during sig-etcd triage meeting. This is potentially a candidate for graduation via the new design for etcd server feature flags in future.

We could still add an e2e test, assigned to @vivekpatani
/assign @vivekpatani

k8s-ci-robot · 2024-06-20T18:19:52Z

@jmhbnz: GitHub didn't allow me to assign the following users: vivekpatani.

Note that only etcd-io members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

Discussed during sig-etcd triage meeting. This is potentially a candidate for graduation via the new design for etcd server feature flags in future.

We could still add an e2e test, assigned to @vivekpatani
/assign @vivekpatani

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

stale bot added the stale label Mar 9, 2021

ptabor added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed stale labels Mar 9, 2021

lilic mentioned this issue May 5, 2021

Add initial Tracing with OpenTelemetry #12919

Merged

stale bot added the stale label Aug 9, 2021

stale bot closed this as completed Aug 30, 2021

serathius reopened this Mar 10, 2022

stale bot removed the stale label Mar 10, 2022

serathius added the stage/tracked label Mar 10, 2022

serathius added this to the etcd-v3.6 milestone Mar 10, 2022

hanyuancheung mentioned this issue Mar 16, 2022

Update OpenTelemetry Tracing version status #13808

Closed

ahrtr mentioned this issue Mar 17, 2022

Add integration test for etcd tracing #13814

Closed

This was referenced Aug 4, 2022

[v3.5] Change the default sampling rate from 100% to 0% #14310

Closed

[v3.5] Update OpenTelemetry to v1.0+ #14311

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Distributed Tracing using OpenTelemetry #12460

Add Distributed Tracing using OpenTelemetry #12460

dashpole commented Nov 6, 2020 •

edited

Loading

logicalhan commented Nov 6, 2020

hexfusion commented Nov 6, 2020

jingyih commented Nov 9, 2020

stale bot commented Mar 9, 2021

lilic commented Apr 22, 2021

dashpole commented Apr 22, 2021 •

edited

Loading

lilic commented Apr 27, 2021 •

edited

Loading

logicalhan commented Apr 27, 2021

logicalhan commented Apr 27, 2021

dashpole commented Apr 27, 2021

lilic commented Apr 27, 2021

stale bot commented Aug 9, 2021

serathius commented Mar 10, 2022

houseme commented Mar 16, 2022

baryluk commented Apr 22, 2022 •

edited

Loading

jmhbnz commented Jun 20, 2024

k8s-ci-robot commented Jun 20, 2024

Add Distributed Tracing using OpenTelemetry #12460

Add Distributed Tracing using OpenTelemetry #12460

Comments

dashpole commented Nov 6, 2020 • edited Loading

logicalhan commented Nov 6, 2020

hexfusion commented Nov 6, 2020

jingyih commented Nov 9, 2020

stale bot commented Mar 9, 2021

lilic commented Apr 22, 2021

dashpole commented Apr 22, 2021 • edited Loading

lilic commented Apr 27, 2021 • edited Loading

logicalhan commented Apr 27, 2021

logicalhan commented Apr 27, 2021

dashpole commented Apr 27, 2021

lilic commented Apr 27, 2021

stale bot commented Aug 9, 2021

serathius commented Mar 10, 2022

houseme commented Mar 16, 2022

baryluk commented Apr 22, 2022 • edited Loading

jmhbnz commented Jun 20, 2024

k8s-ci-robot commented Jun 20, 2024

dashpole commented Nov 6, 2020 •

edited

Loading

dashpole commented Apr 22, 2021 •

edited

Loading

lilic commented Apr 27, 2021 •

edited

Loading

baryluk commented Apr 22, 2022 •

edited

Loading