From 1d5b48cce9b788d9efefabeccb05fe76ca82d8d9 Mon Sep 17 00:00:00 2001 From: yifeizhuang Date: Thu, 20 Jul 2023 14:00:08 -0700 Subject: [PATCH 01/30] start --- A66-open-telemetry-tracing.md | 385 ++++++++++++++++++++++++++++++++++ 1 file changed, 385 insertions(+) create mode 100644 A66-open-telemetry-tracing.md diff --git a/A66-open-telemetry-tracing.md b/A66-open-telemetry-tracing.md new file mode 100644 index 000000000..9fb72533a --- /dev/null +++ b/A66-open-telemetry-tracing.md @@ -0,0 +1,385 @@ +A66: Open Telemetry Tracing +---- +* Author(s): @zivy +* Approver: @ejona86 @markdroth @dfawley @fengli +* Status: {Draft, In Review, Ready for Implementation, Implemented} +* Implemented in: +* Last updated: 2023-07 +* Discussion at: + +## Abstract + +This proposal adds support for OpenTelemetry tracing and migration +paths away from OpenCensus tracing. Discussions include: +* Context propagation between a gRPC client and gRPC server during migration. +* Migrate a gRPC involved software binary that has both OpenTelemetry and +OpenCensus dependency. +* The API surface to enable and configure OpenTelemetry tracing. + +Note that stats and logging are out of scope. + +## Background + +This work aligns with the community consensus to switch to OpenTelemetry as the +next generation OpenCensus. The latter is no longer maintained after July 31, 2023. + +Currently, gRPC supports OpenCensus based tracing in its grpc-census module, or +alike. gRPC tracing is built with gRPC's core stream tracer and interceptor +infrastructures. A gRPC client when intercepting the call creates a child span +with the parent span from the current context, and further creates attempt spans +upon stream creation for each attempt. [gRFC A45][A45] describes cross language +design considerations about creating a +child span for each individual call attempt. A gRPC server uses span ID from the +incoming request header as a parent span to maintain the parent child span +relationship with the gRPC client. To propagate span context over the wire, gRPC +uses metadata (header name: `grpc-trace-bin`) and OpenCensus's binary format for +(de)serialization. The header name is unique from other census library +propagators to differentiate with the application’s tracing instrumentation. + +The following tracing information during an RPC lifecycle is captured: + +At the client, on parent span: +* When the call is started, annotate name resolution completed if the RPC had +name resolution delay. +* When the uncompressed size of some outbound data is revealed, annotate seq no., +type(Received) and wire message size. +* When the call is closed, end the parent span with RPC status. + +On attempt span: +* When span is created, add attributes on: previous-rpc-attempts and transparent-retry. +* When the stream is created on transport, annotate delayed load balancer pick +complete, if any. +* When an outbound message has been sent, add a message event to capture seq no., +type(SENT) and wire size. +* When an inbound message has been received from the transport, add a message +event to capture seq no., type(Received) and uncompressed message size. +* When the stream is closed, end the child span with RPC status. + +At the server: +* When an outbound message has been sent, add a message event to capture seq no., +type(SENT) and size. +* When an inbound message has been read from the transport, add a message event +to capture seq no., type(Received) and size. +* When the uncompressed size of some inbound data is revealed, annotate seq no., +type(Received) and wire message size. +* When the stream is closed, end the span with RPC status. + +### gRPC Census API +The APIs to enable gRPC tracing are different between languages, e.g. in +grpc-java it is zero-configured: as long as the grpc-census dependency exists in +the classpath, the tracing is automatically generated. In C++, there is an API +for the users to call to enable tracing. In Go, it is exposed via stream tracers. + +### gRPC GCP Observability +Following the census tracing instrumentation, gRPC supports exporting tracing +information to GCP Stackdriver, see [user guide][grpc-observability-public-doc]. +We distinguish and exclude GCP observability from this design: GCP observability +is about exporting data while this design is about instrumentation. Migrating +GCP observability to OpenTelemetry is a future project that depends on this work. + +### Related Proposals and Documents: +* [gRFC L29: C++ API Changes for OpenCensus Integration][L29] +* [gRFC A45: Exposing OpenCensus Metrics and Tracing for gRPC retry][A45] +* [Microservices observability overview] [grpc-observability-public-doc] +* [gRFC A66: OpenTelemetry Metrics][A66] + +## Proposal + +gRPC users depending on the grpc-census module have non-trivial migration paths +to OpenTelemetry. Consider the following use cases: +1. Migrate the application binary where both OpenCensus and OpenTelemetry may exist +in the dependency tree. This can be the application’s own tracing code, or gRPC +OpenCensus, or other dependencies that involve OpenCensus and/or OpenTelemetry. +2. Compatibility between a gRPC client and server as two distributed components, +where during migration one will use OpenCensus and the other will use OpenTelemetry. + +All related APIs here are experimental until metric [gRPC A66][A66] is design and implementation complete. + +### Grpc OpenTelemetry module +We will add a separate module grpc-open-telemetry that integrates OpenTelemetry, +similar to the grpc-census module. Languages will keep using the same gRPC +infrastructures, e.g. interceptor and stream tracer to implement the feature. +Providing both modules is to allow users who already depend on grpc-census to +continue using it for the newer grpc version and offer grace time for the migration. + + +### Propagator Wireformat +While gRPC OpenCensus directly interacts with metadata API, gRPC OpenTelemetry +will use the standardized propagators API for context propagation, for the +following benefits: +1. Fully integration with OpenTelemetry APIs that is easier for users to reason about. +2. Make it possible to plugin other propagators that the community supports. + +As of today, OpenTelemetry propagator API only supports `TextMapPropagator`, +that is to send string key/value pairs between the client and server, which is +different from the binary header that gRPC currently uses. The future roadmap +to support binary propagators at Open Telemetry is unclear. So, gRPC will use +propagator API in TextMap format with optimization path (Go and Java) to work +around the binary propagator API. Once the OpenTelemetry binary propagator API +is available in the future, we can continuously integrate with those API with +little effort. In fact, textmap propagator has negligible performance impact for +C++, so a binary propagator API is not critical. + +gRPC will expose a custom `grpcTraceBinPropagator` that implements `TextMapPropagator`. +This grpc-provided propagator still uses the `grpc-trace-bin` header for context +propagation. The OpenCensus spanContext and OpenTelemetry spanContext transmitted +in binary over the wire are identical, therefore a gRPC OpenCensus client can +speak with a gRPC OpenTelemetry server and vice versa. Users can provide a +single composite propagator that combines one or multiple `TextMapPropagator` +for their client and server separately. This way, users can define their own +migration path for context propagators in distributed components. Configuring +gRPC OpenTelemetry with this propagator when dealing with cross-cutting concerns +is straightforward and recommended. + +#### Propagator API in Java/Go +The pseudocode below shows `grpcTraceBinPropagator` and the corresponding +gRPC setter/getters with an optimization path. + +```Java +public class GrpcTraceBinTextMapPropagator implements TextMapPropagator { + @Override + public void inject(Context context, @Nullable C carrier, TextMapSetter setter) { + SpanContext spanContext = Span.fromContext(context).getSpanContext(); + byte[] value = BinaryFormat.toBytes(spanContext); + if (setter instanceof GrpcTraceBinSetter) { + // Fast path in Java and Go that passes bytes directly through API boundaries using + // the overloaded set(Metadata, String, byte[]) method added by gRPC. + ((GrpcTraceBinSetter) setter).set((Metadata) carrier, "grpc-trace-bin", value); + } else { + // Slow path for C++. gRPC C++ does not have type checking, so we encode bytes to + // String to comply with the TextMapSetter API. This code path is also used in the + // situation where GrpcTraceBinTextMapPropagator is used with a TextMapSetter + // externally. + setter.set(carrier, "grpc-trace-bin", Base64.getEncoder().encodeToString(value)); + } + } + + @Override + public Context extract(Context context, @Nullable C c, TextMapGetter textMapGetter) { + byte[] bytes; + if (textMapGetter instanceof GrpcTraceBinGetter) { //Fast path for Java/Go + bytes = ((GrpcTraceBinGetter) textMapGetter).getBinary((Metadata) c, "grpc-trace-bin"); + } else { + // Slow path for C++. gRPC C++ does not have type checking, so we decode String + // from TextMapGetter API to bytes. This code path applies to the situation + // where GrpcTraceBinTextMapPropagator is used with a TextMapGetter externally. + String contextString = textMapGetter.get(c, "grpc-trace-bin"); + bytes = Base64.getDecoder().decode(contextString); + } + SpanContext spanContext = BinaryFormat.parseBytes(bytes); + return context.with(Span.wrap(spanContext)); + } +} + +``` + +The `grpcTraceBinPropagator` should be compatible with any Getter/Setter, but +internally in gRPC, in Java and Go we implement a special gRPC Getter/Setter +that use an optimization path to work around the lack of binary propagator API +and avoid base64 (de)encoding when passing data between API interfaces. This special +gRPC Getter/Setter will also be responsible for handling other propagators, +see pseudocode below. + +```Java +@Internal +class GrpcCommonSetter implements TextMapSetter, GrpcBinarySetter { + // Fast path for Java and Go. Overload set() method to accept bytes value to avoid + // base64 encoding/decoding between API boundaries. + void set(Metadata header, String key, byte[] value) { + assert key.endsWith("-bin"); + header.put(Metadata.Key.of(key, BINARY_BYTE_MARSHALLER), value); + } + + void set(Metadata header, String key, String value) { + if (key.endsWith("-bin")) { + // Slower path in C++. It shows the decoding part of the just encoded String at + // GrpcTraceBinTextMapPropagator.inject(). + // This can also be used to propagate any other binary header. + header.put(Metadata.Key.of(key, BINARY_BYTE_MARSHALLER), Base64.getDecoder().decode(value)); + } else { + // Used by other TextMap propagators, e.g. w3c. + header.put(Metadata.Key.of(key, ASCII_STRING_MARSHALLER), value); + } + } +} + +class GrpcCommonGetter implements TextMapGetter { + public String get(@Nullable Metadata carrier, String key) { + if (key.endsWith("-bin")) { + // Slow path for C++: encode the bytes to String. Later we decode to bytes + // in GrpcTraceBinTextMapPropagator.extract(). + byte[] value = carrier.get(Metadata.Key.of(key, BINARY_BYTE_MARSHALLER)); + return Base64.getEncoder().encodeToString(value); + } else { + // Used by other TextMap propagators, e.g. w3c. + return carrier.get(Metadata.Key.of(key, ASCII_STRING_MARSHALLER)); + } + } + + // Add a new method to optimize the TextMap propagator. + public byte[] getBinary(@Nullable Metadata carrier, String key) { + assert key.endsWith("bin"); + return carrier.get(Metadata.Key.of(key, BINARY_BYTE_MARSHALLER)); + } +} + +// This interface will be implemented by gRPCCommonSetter/Getter as an optimization path +// to avoid base64 encoding between TextMap APIs due to lack of +// OpenTelemetry binary propagator API. +private interface GrpcBinarySetter { + void set(Metadata header, String key, byte[] value); +} + +``` + + +The `GrpcCommonSetter` adds an overloaded `set()` method to directly take `bytes[]` +(Java and Go) to avoid extra encoding/decoding. For the normal `set()` method it +should handle both binary (`-bin`) header ( e.g. for C++ `grpc-trace-bin`) and +ASCII header which can be used by other text map propagators that gRPC +OpenTelemetry supports (e.g. w3c). +The `GrpcTraceBinGetter` adds new method `getBinary()` for the optimized path +in Java and Go. The normal get() method handles both binary header +(for C++ `grpc-trace-bin`) and ASCII header propagators, like w3c. + +#### Propagation API in C++ +C++ will also support the propagator API. But due to the language restriction, +C++ can not take the optimization path to workaround lacking the binary +propagator API. However, C++ can be configured to interact with metadata +directly, like the current gRPC OpenCensus, instead of using propagator API, so +that it avoids paying for the performance cost due to string/binary encoding. +The two APIs C++ will support for context propagation are: +* If the propagator is configured, take a slower path in the pseudocode described +above. Supporting propagator API puts us in the API uniformity among languages. + +* If explicitly configured, gRPC will directly use `Metadata.get()` and `Metadata.put()` +APIs on the “grpc-trace-bin” header. No TextMapPropagator API and TextMapSetter/Getter +will be involved. This is a faster path and mitigates performance concerns due +to base64 encoding. + +TODO: add pseudocode here in C++ for `grpcTraceBinPropagator`,`GrpcCommonSetter` +and `GrpcCommonSetter`. + +The `GrpcCommonSetter.set()` and `GrpcCommonGetter.get()` method in C++ +should handle both binary (`-bin`) header +(e.g. `grpc-trace-bin`) and ASCII header which can be used by other text map +propagators that gRPC OpenTelemetry supports (e.g. w3c). + + +### Grpc OpenTelemetry API +We provide APIs to allow users to enable and configure OpenTelemetry, including +the propagators. The OpenTelemetry API will coexist with the OpenCensus API until +the latter is dropped. Only one "grpc-trace-bin" header will be sent for a single RPC. + +The APIs are different among languages due to different underlying infrastructures. +In Java, it will be part of global interceptors, so that the interceptors are +managed in a more sustainable way and user-friendly. As a prerequisite, the stream +tracer factory API will be stabilized. OpenTelemetryModule will be created with +an OpenTelemetryAPI parameter passing in for necessary configurations. +Users can also rely on SDK autoconfig extension that configure the sdk object +through environmental variables or Java system properties, then obtain the sdk +object passed in to gRPC. + +```Java +// Construct OpenTelemetry to be passed to gRPC OT module for trace/meter +// configurations +SdkTracerProvider sdkTracerProvider = SdkTracerProvider + .builder() + .addSpanProcessor( + BatchSpanProcessor.builder(exporter).build()) + .build(); + +OpenTelemetry openTelemetry = OpenTelemetrySdk.builder() + .setTracerProvider(sdkTracerProvider) + .setMeterProvider(...) + .setPropagators( + ContextPropagators.create(GrpcTraceBinTextMapPropagator.getInstance())) + .build(); + +// Alternatively, use auto configuration: +// OpenTelemetry openTelemetry = +// AutoConfiguredOpenTelemetrySdk.getOpenTelemetrySdk(). + +// OpenTelemetryModule.getInstance() will be using GlobalOpenTelemetry +OpenTelemetryModule otModule = OpenTelemetryModule.getInstance(openTelemetry); + +GlobalInterceptors.setInterceptors( + Arrays.asList(otModule.getClientTracingInterceptor()), + Arrays.asList(otModule.getServerTracerFactory())); + +``` + +In C++, it will be a method that mirrors OpenCensus API, e.g. + +TODO: update C++ API: + +```C++ +// Enable OpenTelemetry based tracing. Similar to +// RegisterOpenCensusPlugin(). TracerProvider is configured via sdk separately. +void RegisterOpenTelemetryTracingPlugin(); + +``` + +In Go, the OpenTelemetry stream tracers and interceptors will be provided for users to install. + +TODO: add Go API + +### Migrate to Open Telemetry: cross-process networking concerns +When clients first introduce gRPC OpenTelemetry, for the time window when the +gRPC client and server have mixed plugins of OpenTelemetry and OpenCensus, +with grpcTraceBinPropagator users can do migration easily with the compatibility +guaranteed. It is encouraged to use `grpc-trace-bin` header for migration because +of the following advantages: +* Simplified migration path, no migration phase deployments. +* Binary header is more efficient. + +A binary formatter implementation for OpenTelemetry is needed, which can be +similar to OpenCensus implementation. Go already has community support for that. + +OpenTelemetry and its extension packages support multiple text map propagators, +e.g. W3C trace context or b3. The API supports specifying multiple propagators: +either public standard ones or custom propagators that implement the OpenTelemetry +propagator API interface. The API composites the propagators and gRPC puts all +the propagator data into the wire through metadata. This allows users to migrate +a group of applications with an old propagator to a new propagator. An example +migration path can be: +1. Set server to accept both old and new propagators. +2. Add desired new propagators to the client. +3. Make the server only accept the new propagators and complete the migration. + +### Migrate to OpenTelemetry: in binary +The OpenCensus [shim](https://github.com/open-telemetry/opentelemetry-java/tree/main/opencensus-shim) +(currently available in Java, Go, Python) allows binaries that have a mix of +OpenTelemetry and OpenCensus dependencies to export trace spans from both with +the correct parent-child relationship. This is the recommended approach to migrate +to OpenTelemetry in one binary gradually. Note that the in-binary migration and +cross-cutting concerns migration can be done in parallel. + +The shim packages that bridge two libraries works as follows, considering the +following migration scenarios example: + +|-- Application - Configured OpenCensus --------------------------------- |\ +|-- gRPC -> Using OpenCensus to generate Trace A --------- |\ +|-- Application -> Using OpenCensus to generate a sub Trace B-- | + +The application may use a bridge package in the outermost layer first:\ +|-- Application - Configured Otel w/ OpenCensus Shim ------------------- |\ +|-- gRPC -> Using OpenCensus to generate Trace A --------- |\ +|-- Application -> Using OpenCensus to generate a sub Trace B-- | + +Then the application change the instrumentation to Open Telemetry:\ +|-- Application - Configured Otel w/ OpenCensus Shim ---------------------- |\ +|-- gRPC -> Using Otel to generate Trace A --------- |\ +|-- Application -> Using Otel to generate a sub Trace B-- | + +Finally, they switch to grpc-open-telemetry and finish the migration.\ +|-- Application - Configured Otel standalone ----------------------------- |\ +|-- gRPC -> Using Otel to generate Trace A --------- |\ +|-- Application -> Using Otel to generate a sub Trace B-- | + +## Rationale + +[L29]: L29-cpp-opencensus-filter.md +[A45]: A45-retry-stats.md +[grpc-observability-public-doc]: https://cloud.google.com/stackdriver/docs/solutions/grpc +[A66]: A66-otel-stats.md \ No newline at end of file From 67e34e8a5b43944ff15f2b3c9187bf9cf8181a1d Mon Sep 17 00:00:00 2001 From: yifeizhuang Date: Mon, 21 Aug 2023 17:09:12 -0700 Subject: [PATCH 02/30] rename to A67 --- A66-open-telemetry-tracing.md => A67-open-telemetry-tracing.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) rename A66-open-telemetry-tracing.md => A67-open-telemetry-tracing.md (99%) diff --git a/A66-open-telemetry-tracing.md b/A67-open-telemetry-tracing.md similarity index 99% rename from A66-open-telemetry-tracing.md rename to A67-open-telemetry-tracing.md index 9fb72533a..9d6860492 100644 --- a/A66-open-telemetry-tracing.md +++ b/A67-open-telemetry-tracing.md @@ -1,4 +1,4 @@ -A66: Open Telemetry Tracing +A67: Open Telemetry Tracing ---- * Author(s): @zivy * Approver: @ejona86 @markdroth @dfawley @fengli From 7f668a77304544df0a954a074687c18969f0809b Mon Sep 17 00:00:00 2001 From: yifeizhuang Date: Mon, 21 Aug 2023 17:15:14 -0700 Subject: [PATCH 03/30] rename A72 --- A67-open-telemetry-tracing.md => A72-open-telemetry-tracing.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) rename A67-open-telemetry-tracing.md => A72-open-telemetry-tracing.md (99%) diff --git a/A67-open-telemetry-tracing.md b/A72-open-telemetry-tracing.md similarity index 99% rename from A67-open-telemetry-tracing.md rename to A72-open-telemetry-tracing.md index 9d6860492..318b70a98 100644 --- a/A67-open-telemetry-tracing.md +++ b/A72-open-telemetry-tracing.md @@ -1,4 +1,4 @@ -A67: Open Telemetry Tracing +A72: Open Telemetry Tracing ---- * Author(s): @zivy * Approver: @ejona86 @markdroth @dfawley @fengli From 00416419cc04ba3b513b1bb77826ad335fb07de8 Mon Sep 17 00:00:00 2001 From: yifeizhuang Date: Thu, 24 Aug 2023 15:33:10 -0700 Subject: [PATCH 04/30] minor fixes --- A72-open-telemetry-tracing.md | 301 +++++++++++++++++++--------------- 1 file changed, 172 insertions(+), 129 deletions(-) diff --git a/A72-open-telemetry-tracing.md b/A72-open-telemetry-tracing.md index 318b70a98..21344eefb 100644 --- a/A72-open-telemetry-tracing.md +++ b/A72-open-telemetry-tracing.md @@ -1,17 +1,18 @@ -A72: Open Telemetry Tracing +A72: OpenTelemetry Tracing ---- -* Author(s): @zivy -* Approver: @ejona86 @markdroth @dfawley @fengli -* Status: {Draft, In Review, Ready for Implementation, Implemented} +* Author(s): [Yifei Zhuang](https://github.com/YifeiZhuang) +* Approver: [Eric Anderson](https://github.com/ejona86), [Mark + Roth](https://github.com/markdroth), [Doug Fawley](https://github.com/dfawley), +[Feng Li](https://github.com/fengli79) +* Status: Ready for Implementation * Implemented in: * Last updated: 2023-07 * Discussion at: ## Abstract - -This proposal adds support for OpenTelemetry tracing and migration +This proposal adds support for OpenTelemetry tracing and suggests migration paths away from OpenCensus tracing. Discussions include: -* Context propagation between a gRPC client and gRPC server during migration. +* Context propagation between a gRPC client and server during migration. * Migrate a gRPC involved software binary that has both OpenTelemetry and OpenCensus dependency. * The API surface to enable and configure OpenTelemetry tracing. @@ -19,22 +20,21 @@ OpenCensus dependency. Note that stats and logging are out of scope. ## Background - This work aligns with the community consensus to switch to OpenTelemetry as the next generation OpenCensus. The latter is no longer maintained after July 31, 2023. -Currently, gRPC supports OpenCensus based tracing in its grpc-census module, or -alike. gRPC tracing is built with gRPC's core stream tracer and interceptor +Currently, gRPC supports OpenCensus based tracing in its grpc-census plugin, or +alike. gRPC trace is built with its core stream tracer and interceptor infrastructures. A gRPC client when intercepting the call creates a child span with the parent span from the current context, and further creates attempt spans upon stream creation for each attempt. [gRFC A45][A45] describes cross language -design considerations about creating a -child span for each individual call attempt. A gRPC server uses span ID from the -incoming request header as a parent span to maintain the parent child span -relationship with the gRPC client. To propagate span context over the wire, gRPC -uses metadata (header name: `grpc-trace-bin`) and OpenCensus's binary format for -(de)serialization. The header name is unique from other census library -propagators to differentiate with the application’s tracing instrumentation. +design considerations about creating a child span for each individual call attempt. +A gRPC server uses span ID from the incoming request header as a parent span to +maintain the parent child span relationship with the gRPC client. To propagate +span context over the wire, gRPC uses metadata (header name: `grpc-trace-bin`) +and OpenCensus's binary format for (de)serialization. The header name is unique +from other census library propagators to differentiate with the application’s +tracing instrumentation. The following tracing information during an RPC lifecycle is captured: @@ -46,14 +46,16 @@ type(Received) and wire message size. * When the call is closed, end the parent span with RPC status. On attempt span: -* When span is created, add attributes on: previous-rpc-attempts and transparent-retry. +* When span is created, add attribute `previous-rpc-attempts` that captures the +number of preceding attempts for the RPC, and attribute `transparent-retry` that +shows whether stream is a transparent retry. * When the stream is created on transport, annotate delayed load balancer pick complete, if any. * When an outbound message has been sent, add a message event to capture seq no., type(SENT) and wire size. * When an inbound message has been received from the transport, add a message event to capture seq no., type(Received) and uncompressed message size. -* When the stream is closed, end the child span with RPC status. +* When the stream is closed, end the attempt span with RPC status. At the server: * When an outbound message has been sent, add a message event to capture seq no., @@ -66,85 +68,110 @@ type(Received) and wire message size. ### gRPC Census API The APIs to enable gRPC tracing are different between languages, e.g. in -grpc-java it is zero-configured: as long as the grpc-census dependency exists in -the classpath, the tracing is automatically generated. In C++, there is an API -for the users to call to enable tracing. In Go, it is exposed via stream tracers. +grpc-java it is zero-configuration: as long as the grpc-census dependency exists in +the classpath, the traces are automatically generated. In C++, there is an API +for users to call to enable tracing. In Go, it is exposed via stream tracers. ### gRPC GCP Observability -Following the census tracing instrumentation, gRPC supports exporting tracing -information to GCP Stackdriver, see [user guide][grpc-observability-public-doc]. -We distinguish and exclude GCP observability from this design: GCP observability -is about exporting data while this design is about instrumentation. Migrating -GCP observability to OpenTelemetry is a future project that depends on this work. +Following the census tracing instrumentation, gRPC supports exporting traces +to GCP Stackdriver for visualization and analysis, see [user guide][grpc-observability-public-doc]. +We distinguish and exclude gRPC GCP observability from this design: gRPC GCP +observability is about exporting data while this design is about instrumentation. +Migrating gRPC GCP observability to OpenTelemetry is a future project that depends +on this work. ### Related Proposals and Documents: * [gRFC L29: C++ API Changes for OpenCensus Integration][L29] * [gRFC A45: Exposing OpenCensus Metrics and Tracing for gRPC retry][A45] -* [Microservices observability overview] [grpc-observability-public-doc] +* [Microservices observability overview][grpc-observability-public-doc] * [gRFC A66: OpenTelemetry Metrics][A66] ## Proposal - -gRPC users depending on the grpc-census module have non-trivial migration paths +gRPC users depending on the grpc-census plugin have non-trivial migration paths to OpenTelemetry. Consider the following use cases: -1. Migrate the application binary where both OpenCensus and OpenTelemetry may exist +1. Migrate an application binary where both OpenCensus and OpenTelemetry maybe exist in the dependency tree. This can be the application’s own tracing code, or gRPC OpenCensus, or other dependencies that involve OpenCensus and/or OpenTelemetry. 2. Compatibility between a gRPC client and server as two distributed components, where during migration one will use OpenCensus and the other will use OpenTelemetry. -All related APIs here are experimental until metric [gRPC A66][A66] is design and implementation complete. +All related APIs here are experimental until OpenTelemetry metrics [gRPC A66][A66] +is design and implementation complete. -### Grpc OpenTelemetry module -We will add a separate module grpc-open-telemetry that integrates OpenTelemetry, -similar to the grpc-census module. Languages will keep using the same gRPC -infrastructures, e.g. interceptor and stream tracer to implement the feature. -Providing both modules is to allow users who already depend on grpc-census to +### New function in OpenTelemetry plugin +We will add tracing functions in grpc-open-telemetry plugin, along with OpenTelemetry +metrics [gRPC A66][A66]. +Languages will keep using the same gRPC infrastructures, e.g. interceptor and +stream tracer to implement the feature. +We keep the grpc-census plugin to allow users who already depend on grpc-census to continue using it for the newer grpc version and offer grace time for the migration. - - -### Propagator Wireformat -While gRPC OpenCensus directly interacts with metadata API, gRPC OpenTelemetry +In the new function we will produce the same tracing information as we produce +for Census. But due to API differences between OpenCensus and OpenTelemetry, the +trace information has slight differences. +The OpenCensus `MessageEvent` fields maps to OpenTelemetry event attributes: + +| OpenCensus Trace Message Event Fields | OpenTelemetry Event Attribute Key | +|---------------------------------------|-----------------------------------| +| Type | `message.event.type` | +| Message Id | `message.message.id` | +| Uncompressed message size | `message.event.size.uncompressed` | +| Compressed message size | `message.event.size.compressed` | + +OpenCensus span annotation description maps to OpenTelemetry event name, attributes +keys are mapped to: + +| OpenCensus Trace Annotation Attribute Key | OpenTelemetry Event Attribute Key | +|-------------------------------------------|-----------------------------------| +| `id` | `message.event.type` | +| `type` | `message.message.id` | + +And OpenTelemetry no longer has span end options as OpenCensus does. + +### Propagator Wire format +While gRPC OpenCensus directly interacts with the metadata API, gRPC OpenTelemetry will use the standardized propagators API for context propagation, for the following benefits: 1. Fully integration with OpenTelemetry APIs that is easier for users to reason about. 2. Make it possible to plugin other propagators that the community supports. +3. Flexible API that allows clean and simple migration paths to a different propagator. As of today, OpenTelemetry propagator API only supports `TextMapPropagator`, that is to send string key/value pairs between the client and server, which is different from the binary header that gRPC currently uses. The future roadmap -to support binary propagators at Open Telemetry is unclear. So, gRPC will use -propagator API in TextMap format with optimization path (Go and Java) to work -around the binary propagator API. Once the OpenTelemetry binary propagator API -is available in the future, we can continuously integrate with those API with -little effort. In fact, textmap propagator has negligible performance impact for -C++, so a binary propagator API is not critical. - -gRPC will expose a custom `grpcTraceBinPropagator` that implements `TextMapPropagator`. +to support binary propagators at OpenTelemetry is unclear. So, gRPC will use +propagator API in TextMap format with an optimization path (Go and Java) to work +around the binary propagator API. In fact, text map propagator does not show +visible performance impact for C++, which is most sensitive to performance, +based on internal micro benchmarking. Therefore, gRPC will not favor binary +propagators over TextMap propagators. + +gRPC will expose a custom `grpcTraceBinPropagator` that implements `TextMapPropagator`. This grpc-provided propagator still uses the `grpc-trace-bin` header for context propagation. The OpenCensus spanContext and OpenTelemetry spanContext transmitted -in binary over the wire are identical, therefore a gRPC OpenCensus client can +in binary header over the wire are identical, therefore a gRPC OpenCensus client can speak with a gRPC OpenTelemetry server and vice versa. Users can provide a single composite propagator that combines one or multiple `TextMapPropagator` for their client and server separately. This way, users can define their own -migration path for context propagators in distributed components. Configuring -gRPC OpenTelemetry with this propagator when dealing with cross-cutting concerns -is straightforward and recommended. +migration path for context propagators in distributed components, see detailed +discussion in the later session. Configuring gRPC OpenTelemetry with this +propagator when dealing with cross-cutting concerns during migration is +straightforward and recommended. In the long term, community +standardized propagators, e.g. W3C is more encouraged than `grpcTraceBinPropagator`. #### Propagator API in Java/Go -The pseudocode below shows `grpcTraceBinPropagator` and the corresponding -gRPC setter/getters with an optimization path. +The pseudocode below demonstrates `grpcTraceBinPropagator` and the corresponding +gRPC Getter/Setter with an optimization path. ```Java -public class GrpcTraceBinTextMapPropagator implements TextMapPropagator { +public class GrpcTraceBinPropagator implements TextMapPropagator { @Override public void inject(Context context, @Nullable C carrier, TextMapSetter setter) { SpanContext spanContext = Span.fromContext(context).getSpanContext(); byte[] value = BinaryFormat.toBytes(spanContext); - if (setter instanceof GrpcTraceBinSetter) { + if (setter instanceof GrpcCommonSetter) { // Fast path in Java and Go that passes bytes directly through API boundaries using // the overloaded set(Metadata, String, byte[]) method added by gRPC. - ((GrpcTraceBinSetter) setter).set((Metadata) carrier, "grpc-trace-bin", value); + ((GrpcCommonSetter) setter).set((Metadata) carrier, "grpc-trace-bin", value); } else { // Slow path for C++. gRPC C++ does not have type checking, so we encode bytes to // String to comply with the TextMapSetter API. This code path is also used in the @@ -157,8 +184,8 @@ public class GrpcTraceBinTextMapPropagator implements TextMapPropagator { @Override public Context extract(Context context, @Nullable C c, TextMapGetter textMapGetter) { byte[] bytes; - if (textMapGetter instanceof GrpcTraceBinGetter) { //Fast path for Java/Go - bytes = ((GrpcTraceBinGetter) textMapGetter).getBinary((Metadata) c, "grpc-trace-bin"); + if (textMapGetter instanceof GrpcCommonGetter) { //Fast path for Java/Go + bytes = ((GrpcCommonGetter) textMapGetter).getBinary((Metadata) c, "grpc-trace-bin"); } else { // Slow path for C++. gRPC C++ does not have type checking, so we decode String // from TextMapGetter API to bytes. This code path applies to the situation @@ -175,21 +202,24 @@ public class GrpcTraceBinTextMapPropagator implements TextMapPropagator { The `grpcTraceBinPropagator` should be compatible with any Getter/Setter, but internally in gRPC, in Java and Go we implement a special gRPC Getter/Setter -that use an optimization path to work around the lack of binary propagator API -and avoid base64 (de)encoding when passing data between API interfaces. This special -gRPC Getter/Setter will also be responsible for handling other propagators, -see pseudocode below. +that uses an optimization path to work around the lack of binary propagator API +and thus avoid base64 (de)encoding when passing data between API interfaces. +This special gRPC Getter/Setter will also be responsible for handling other +propagators that users will configure with gRPC OpenTelemetry (e.g. w3c), +see the pseudocode below. ```Java @Internal class GrpcCommonSetter implements TextMapSetter, GrpcBinarySetter { // Fast path for Java and Go. Overload set() method to accept bytes value to avoid // base64 encoding/decoding between API boundaries. + @Override void set(Metadata header, String key, byte[] value) { assert key.endsWith("-bin"); header.put(Metadata.Key.of(key, BINARY_BYTE_MARSHALLER), value); } + @Override void set(Metadata header, String key, String value) { if (key.endsWith("-bin")) { // Slower path in C++. It shows the decoding part of the just encoded String at @@ -204,10 +234,11 @@ class GrpcCommonSetter implements TextMapSetter, GrpcBinarySetter { + @Override public String get(@Nullable Metadata carrier, String key) { if (key.endsWith("-bin")) { - // Slow path for C++: encode the bytes to String. Later we decode to bytes - // in GrpcTraceBinTextMapPropagator.extract(). + // Slow path for C++: return string encoded from bytes. Later we decode to + // bytes in GrpcTraceBinTextMapPropagator.extract(). byte[] value = carrier.get(Metadata.Key.of(key, BINARY_BYTE_MARSHALLER)); return Base64.getEncoder().encodeToString(value); } else { @@ -216,7 +247,8 @@ class GrpcCommonGetter implements TextMapGetter { } } - // Add a new method to optimize the TextMap propagator. + // Add a new method to optimize the TextMap propagator to avoid base64 encoding. + @Override public byte[] getBinary(@Nullable Metadata carrier, String key) { assert key.endsWith("bin"); return carrier.get(Metadata.Key.of(key, BINARY_BYTE_MARSHALLER)); @@ -232,28 +264,29 @@ private interface GrpcBinarySetter { ``` - The `GrpcCommonSetter` adds an overloaded `set()` method to directly take `bytes[]` -(Java and Go) to avoid extra encoding/decoding. For the normal `set()` method it -should handle both binary (`-bin`) header ( e.g. for C++ `grpc-trace-bin`) and -ASCII header which can be used by other text map propagators that gRPC -OpenTelemetry supports (e.g. w3c). -The `GrpcTraceBinGetter` adds new method `getBinary()` for the optimized path -in Java and Go. The normal get() method handles both binary header -(for C++ `grpc-trace-bin`) and ASCII header propagators, like w3c. - -#### Propagation API in C++ -C++ will also support the propagator API. But due to the language restriction, +(Java and Go) to avoid extra base64 encoding. For the normal `set()` method it +should handle both binary (`-bin`) header and ASCII header from any TextMap +propagators that users may config. +The `GrpcCommonGetter` adds new method `getBinary()` for the optimized path for +the same reason in Java and Go. Similarly, the normal `get()` method handles both +binary header and TextMap propagators. + +#### Context Propagation APIs in C++ +C++ will also support the propagators API, because this imposes API +uniformity among languages. Due to the language restriction, C++ can not take the optimization path to workaround lacking the binary -propagator API. However, C++ can be configured to interact with metadata -directly, like the current gRPC OpenCensus, instead of using propagator API, so -that it avoids paying for the performance cost due to string/binary encoding. -The two APIs C++ will support for context propagation are: -* If the propagator is configured, take a slower path in the pseudocode described -above. Supporting propagator API puts us in the API uniformity among languages. - +propagator API. That means using propagators API with C++ needs base64 encoding +and therefore is slower compared with just using metadata API. However, C++ can be +configured to interact with metadata directly, like the current gRPC OpenCensus. +This is a faster way that avoids paying for the performance cost due to +string/binary encoding between the propagator and the getter/setter. +We use this strategy to balance between API simplicity and performance efficiency. +The two APIs C++ will support for the context propagation are: +* If `grpcTraceBinPropagator` is configured, take a slower path in the pseudocode +described above. * If explicitly configured, gRPC will directly use `Metadata.get()` and `Metadata.put()` -APIs on the “grpc-trace-bin” header. No TextMapPropagator API and TextMapSetter/Getter +APIs on the `grpc-trace-bin` header. No TextMapPropagator API and TextMapSetter/Getter will be involved. This is a faster path and mitigates performance concerns due to base64 encoding. @@ -262,27 +295,29 @@ and `GrpcCommonSetter`. The `GrpcCommonSetter.set()` and `GrpcCommonGetter.get()` method in C++ should handle both binary (`-bin`) header -(e.g. `grpc-trace-bin`) and ASCII header which can be used by other text map -propagators that gRPC OpenTelemetry supports (e.g. w3c). +(e.g. `grpc-trace-bin`) and ASCII header from other text map +propagators that users configure into gRPC OpenTelemetry, e.g. w3c. - -### Grpc OpenTelemetry API -We provide APIs to allow users to enable and configure OpenTelemetry, including -the propagators. The OpenTelemetry API will coexist with the OpenCensus API until -the latter is dropped. Only one "grpc-trace-bin" header will be sent for a single RPC. +### Grpc OpenTelemetry Tracing API +This section talks about enabling and configuring OpenTelemetry tracing. +The OpenTelemetry API will coexist with the +OpenCensus API until the latter is dropped. Only one "grpc-trace-bin" header +will be sent for a single RPC. The APIs are different among languages due to different underlying infrastructures. + +#### Java In Java, it will be part of global interceptors, so that the interceptors are managed in a more sustainable way and user-friendly. As a prerequisite, the stream tracer factory API will be stabilized. OpenTelemetryModule will be created with -an OpenTelemetryAPI parameter passing in for necessary configurations. +an OpenTelemetryAPI instance passing in for necessary configurations. Users can also rely on SDK autoconfig extension that configure the sdk object through environmental variables or Java system properties, then obtain the sdk object passed in to gRPC. ```Java -// Construct OpenTelemetry to be passed to gRPC OT module for trace/meter -// configurations +// Construct OpenTelemetry to be passed to gRPC OpenTelemetry module for +// trace and metrics configurations. SdkTracerProvider sdkTracerProvider = SdkTracerProvider .builder() .addSpanProcessor( @@ -309,9 +344,10 @@ GlobalInterceptors.setInterceptors( ``` -In C++, it will be a method that mirrors OpenCensus API, e.g. +#### C++ +In C++, it will be a method that mirrors OpenCensus API. -TODO: update C++ API: +TODO: update C++ API. ```C++ // Enable OpenTelemetry based tracing. Similar to @@ -320,64 +356,71 @@ void RegisterOpenTelemetryTracingPlugin(); ``` +#### Go In Go, the OpenTelemetry stream tracers and interceptors will be provided for users to install. -TODO: add Go API +TODO: add Go API. -### Migrate to Open Telemetry: cross-process networking concerns +### Migrate to OpenTelemetry: cross-process networking concerns When clients first introduce gRPC OpenTelemetry, for the time window when the gRPC client and server have mixed plugins of OpenTelemetry and OpenCensus, -with grpcTraceBinPropagator users can do migration easily with the compatibility -guaranteed. It is encouraged to use `grpc-trace-bin` header for migration because -of the following advantages: +with `grpcTraceBinPropagator` users can do migration easily with the compatibility +guaranteed. It is encouraged to use `grpcTraceBinPropagator` that propagates +`grpc-trace-bin` header for migration because of the following advantages: * Simplified migration path, no migration phase deployments. * Binary header is more efficient. -A binary formatter implementation for OpenTelemetry is needed, which can be -similar to OpenCensus implementation. Go already has community support for that. +A binary formatter implementation for OpenTelemetry is needed in each language, +which can be similar to the OpenCensus implementation. Go already has community +support for that. +After migration period, users have the flexibility to switch to other propagators. OpenTelemetry and its extension packages support multiple text map propagators, -e.g. W3C trace context or b3. The API supports specifying multiple propagators: -either public standard ones or custom propagators that implement the OpenTelemetry -propagator API interface. The API composites the propagators and gRPC puts all -the propagator data into the wire through metadata. This allows users to migrate -a group of applications with an old propagator to a new propagator. An example -migration path can be: -1. Set server to accept both old and new propagators. -2. Add desired new propagators to the client. +e.g. W3C trace context or b3. The gRPC OpenTelemetry API allows specifying +multiple propagators: either public standard ones or custom propagators that +implement the OpenTelemetry propagators API interface. The API composites the +propagators and gRPC puts all the propagator data into the wire through metadata. +This allows users to easily migrate a group of applications with an old propagator to +a new propagator. An example migration path can be: +1. Configure server to accept both old and new propagators. +2. Configure the client with the desired new propagators and to drop the old propagator. 3. Make the server only accept the new propagators and complete the migration. ### Migrate to OpenTelemetry: in binary The OpenCensus [shim](https://github.com/open-telemetry/opentelemetry-java/tree/main/opencensus-shim) (currently available in Java, Go, Python) allows binaries that have a mix of -OpenTelemetry and OpenCensus dependencies to export trace spans from both with -the correct parent-child relationship. This is the recommended approach to migrate +OpenTelemetry and OpenCensus dependencies to export trace spans from both frameworks, +and keep the correct parent-child relationship. This is the recommended approach to migrate to OpenTelemetry in one binary gradually. Note that the in-binary migration and cross-cutting concerns migration can be done in parallel. The shim packages that bridge two libraries works as follows, considering the following migration scenarios example: -|-- Application - Configured OpenCensus --------------------------------- |\ -|-- gRPC -> Using OpenCensus to generate Trace A --------- |\ -|-- Application -> Using OpenCensus to generate a sub Trace B-- | +`|-- Application - Configured OpenCensus ------------------------------- |`\ +`|-- gRPC -> Using OpenCensus to generate Trace A -------------------- |`\ +`|-- Application -> Using OpenCensus to generate a sub Trace B--------- |` The application may use a bridge package in the outermost layer first:\ -|-- Application - Configured Otel w/ OpenCensus Shim ------------------- |\ -|-- gRPC -> Using OpenCensus to generate Trace A --------- |\ -|-- Application -> Using OpenCensus to generate a sub Trace B-- | +`|-- Application - Configured Otel w/ OpenCensus Shim ------------------- |`\ +`|-- gRPC -> Using OpenCensus to generate Trace A --------------------- |`\ +`|-- Application -> Using OpenCensus to generate a sub Trace B---------- |` -Then the application change the instrumentation to Open Telemetry:\ -|-- Application - Configured Otel w/ OpenCensus Shim ---------------------- |\ -|-- gRPC -> Using Otel to generate Trace A --------- |\ -|-- Application -> Using Otel to generate a sub Trace B-- | +Then the application changes the instrumentation to OpenTelemetry:\ +`|-- Application - Configured Otel w/ OpenCensus Shim ---------------------- |`\ +`|-- gRPC -> Using Otel to generate Trace A ------------------------------ |`\ +`|-- Application -> Using Otel to generate a sub Trace B------------------- |` Finally, they switch to grpc-open-telemetry and finish the migration.\ -|-- Application - Configured Otel standalone ----------------------------- |\ -|-- gRPC -> Using Otel to generate Trace A --------- |\ -|-- Application -> Using Otel to generate a sub Trace B-- | +`|-- Application - Configured Otel standalone ----------------------------- |`\ +`|-- gRPC -> Using Otel to generate Trace A ----------------------------- |`\ +`|-- Application -> Using Otel to generate a sub Trace B------------------ |` + +## Rational +N/A -## Rationale +## Implementation +Will be implemented in Java, C++, Go and Python. [L29]: L29-cpp-opencensus-filter.md [A45]: A45-retry-stats.md From 7a6e596bd5697d60245b10dd59a8f9b735a1fe4a Mon Sep 17 00:00:00 2001 From: yifeizhuang Date: Wed, 11 Oct 2023 12:30:28 -0700 Subject: [PATCH 05/30] add email thread discussion, and text changes from ejona --- A72-open-telemetry-tracing.md | 102 ++++++++++++++++++---------------- 1 file changed, 54 insertions(+), 48 deletions(-) diff --git a/A72-open-telemetry-tracing.md b/A72-open-telemetry-tracing.md index 21344eefb..41af2c5bf 100644 --- a/A72-open-telemetry-tracing.md +++ b/A72-open-telemetry-tracing.md @@ -1,13 +1,13 @@ A72: OpenTelemetry Tracing ---- * Author(s): [Yifei Zhuang](https://github.com/YifeiZhuang) -* Approver: [Eric Anderson](https://github.com/ejona86), [Mark - Roth](https://github.com/markdroth), [Doug Fawley](https://github.com/dfawley), +* Approver: [Eric Anderson](https://github.com/ejona86) +* Reviewers: [Mark Roth](https://github.com/markdroth), [Doug Fawley](https://github.com/dfawley), [Feng Li](https://github.com/fengli79) * Status: Ready for Implementation * Implemented in: * Last updated: 2023-07 -* Discussion at: +* Discussion at: https://groups.google.com/g/grpc-io/c/e_ByaRmtJak ## Abstract This proposal adds support for OpenTelemetry tracing and suggests migration @@ -98,7 +98,7 @@ where during migration one will use OpenCensus and the other will use OpenTeleme All related APIs here are experimental until OpenTelemetry metrics [gRPC A66][A66] is design and implementation complete. -### New function in OpenTelemetry plugin +### New Function In OpenTelemetry Plugin We will add tracing functions in grpc-open-telemetry plugin, along with OpenTelemetry metrics [gRPC A66][A66]. Languages will keep using the same gRPC infrastructures, e.g. interceptor and @@ -127,11 +127,11 @@ keys are mapped to: And OpenTelemetry no longer has span end options as OpenCensus does. -### Propagator Wire format +### Propagator Wire Format While gRPC OpenCensus directly interacts with the metadata API, gRPC OpenTelemetry will use the standardized propagators API for context propagation, for the following benefits: -1. Fully integration with OpenTelemetry APIs that is easier for users to reason about. +1. Full integration with OpenTelemetry APIs that is easier for users to reason about. 2. Make it possible to plugin other propagators that the community supports. 3. Flexible API that allows clean and simple migration paths to a different propagator. @@ -140,15 +140,15 @@ that is to send string key/value pairs between the client and server, which is different from the binary header that gRPC currently uses. The future roadmap to support binary propagators at OpenTelemetry is unclear. So, gRPC will use propagator API in TextMap format with an optimization path (Go and Java) to work -around the binary propagator API. In fact, text map propagator does not show -visible performance impact for C++, which is most sensitive to performance, -based on internal micro benchmarking. Therefore, gRPC will not favor binary -propagators over TextMap propagators. +around the lack of binary propagator API to support `grpc-trace-bin`. In fact, +TextMap propagator does not show visible performance impact for C++, which is +the most sensitive language to performance, based on internal micro benchmarking. +Therefore, gRPC will only support `grpc-trace-bin` and TextMap propagators. gRPC will expose a custom `grpcTraceBinPropagator` that implements `TextMapPropagator`. This grpc-provided propagator still uses the `grpc-trace-bin` header for context -propagation. The OpenCensus spanContext and OpenTelemetry spanContext transmitted -in binary header over the wire are identical, therefore a gRPC OpenCensus client can +propagation. When using `grpc-trace-bin` the OpenCensus spanContext and +OpenTelemetry spanContext are identical, therefore a gRPC OpenCensus client can speak with a gRPC OpenTelemetry server and vice versa. Users can provide a single composite propagator that combines one or multiple `TextMapPropagator` for their client and server separately. This way, users can define their own @@ -173,10 +173,10 @@ public class GrpcTraceBinPropagator implements TextMapPropagator { // the overloaded set(Metadata, String, byte[]) method added by gRPC. ((GrpcCommonSetter) setter).set((Metadata) carrier, "grpc-trace-bin", value); } else { - // Slow path for C++. gRPC C++ does not have type checking, so we encode bytes to - // String to comply with the TextMapSetter API. This code path is also used in the - // situation where GrpcTraceBinTextMapPropagator is used with a TextMapSetter - // externally. + // Slow path for C++. gRPC C++ does not have runtime type inspection, so we + // encode bytes to String to comply with the TextMapSetter API. This code + // path is also used in the situation where GrpcTraceBinTextMapPropagator + // is used with a TextMapSetter externally. setter.set(carrier, "grpc-trace-bin", Base64.getEncoder().encodeToString(value)); } } @@ -277,8 +277,9 @@ C++ will also support the propagators API, because this imposes API uniformity among languages. Due to the language restriction, C++ can not take the optimization path to workaround lacking the binary propagator API. That means using propagators API with C++ needs base64 encoding -and therefore is slower compared with just using metadata API. However, C++ can be -configured to interact with metadata directly, like the current gRPC OpenCensus. +and therefore is slower compared with just using metadata API. However, C++ will +have an API that enables adding `grpc-trace-bin` to the metadata directly, without +using the propagators API. This is a faster way that avoids paying for the performance cost due to string/binary encoding between the propagator and the getter/setter. We use this strategy to balance between API simplicity and performance efficiency. @@ -295,14 +296,14 @@ and `GrpcCommonSetter`. The `GrpcCommonSetter.set()` and `GrpcCommonGetter.get()` method in C++ should handle both binary (`-bin`) header -(e.g. `grpc-trace-bin`) and ASCII header from other text map +(e.g. `grpc-trace-bin`) and ASCII header from other TextMap propagators that users configure into gRPC OpenTelemetry, e.g. w3c. -### Grpc OpenTelemetry Tracing API +### gRPC OpenTelemetry Tracing API This section talks about enabling and configuring OpenTelemetry tracing. -The OpenTelemetry API will coexist with the -OpenCensus API until the latter is dropped. Only one "grpc-trace-bin" header -will be sent for a single RPC. +The OpenTelemetry API will coexist with the OpenCensus API. Only one +`grpc-trace-bin` header will be sent for a single RPC as long as only one of +OpenTelemetry or OpenCensus is enabled for the channel. The APIs are different among languages due to different underlying infrastructures. @@ -361,16 +362,13 @@ In Go, the OpenTelemetry stream tracers and interceptors will be provided for us TODO: add Go API. -### Migrate to OpenTelemetry: cross-process networking concerns -When clients first introduce gRPC OpenTelemetry, for the time window when the +### Migrate to OpenTelemetry: Cross-process Networking Concerns +When users first introduce gRPC OpenTelemetry, for the time window when the gRPC client and server have mixed plugins of OpenTelemetry and OpenCensus, -with `grpcTraceBinPropagator` users can do migration easily with the compatibility -guaranteed. It is encouraged to use `grpcTraceBinPropagator` that propagates -`grpc-trace-bin` header for migration because of the following advantages: -* Simplified migration path, no migration phase deployments. -* Binary header is more efficient. +it is encouraged to use `grpcTraceBinPropagator` that propagates +`grpc-trace-bin` header for the migration. Using the same header greatly simplifies rollout. -A binary formatter implementation for OpenTelemetry is needed in each language, +A `grpc-trace-bin` formatter implementation for OpenTelemetry is needed in each language, which can be similar to the OpenCensus implementation. Go already has community support for that. @@ -386,7 +384,7 @@ a new propagator. An example migration path can be: 2. Configure the client with the desired new propagators and to drop the old propagator. 3. Make the server only accept the new propagators and complete the migration. -### Migrate to OpenTelemetry: in binary +### Migrate to OpenTelemetry: In Binary The OpenCensus [shim](https://github.com/open-telemetry/opentelemetry-java/tree/main/opencensus-shim) (currently available in Java, Go, Python) allows binaries that have a mix of OpenTelemetry and OpenCensus dependencies to export trace spans from both frameworks, @@ -397,26 +395,34 @@ cross-cutting concerns migration can be done in parallel. The shim packages that bridge two libraries works as follows, considering the following migration scenarios example: -`|-- Application - Configured OpenCensus ------------------------------- |`\ -`|-- gRPC -> Using OpenCensus to generate Trace A -------------------- |`\ -`|-- Application -> Using OpenCensus to generate a sub Trace B--------- |` +```agsl +|-- Application - Configured OpenCensus ------------------------------- | +|-- gRPC -> Using OpenCensus to generate Trace A -------------------- | +|-- Application -> Using OpenCensus to generate a sub Trace B--------- | +``` -The application may use a bridge package in the outermost layer first:\ -`|-- Application - Configured Otel w/ OpenCensus Shim ------------------- |`\ -`|-- gRPC -> Using OpenCensus to generate Trace A --------------------- |`\ -`|-- Application -> Using OpenCensus to generate a sub Trace B---------- |` +The application may use a bridge package in the outermost layer first: +```agsl +|-- Application - Configured Otel w/ OpenCensus Shim ------------------- | +|-- gRPC -> Using OpenCensus to generate Trace A --------------------- | +|-- Application -> Using OpenCensus to generate a sub Trace B---------- | +``` -Then the application changes the instrumentation to OpenTelemetry:\ -`|-- Application - Configured Otel w/ OpenCensus Shim ---------------------- |`\ -`|-- gRPC -> Using Otel to generate Trace A ------------------------------ |`\ -`|-- Application -> Using Otel to generate a sub Trace B------------------- |` +Then the application changes the instrumentation to OpenTelemetry: +```agsl +|-- Application - Configured Otel w/ OpenCensus Shim ---------------------- | +|-- gRPC -> Using OpenCensus to generate Trace A -------------------------| +|-- Application -> Using Otel to generate a sub Trace B------------------- | +``` -Finally, they switch to grpc-open-telemetry and finish the migration.\ -`|-- Application - Configured Otel standalone ----------------------------- |`\ -`|-- gRPC -> Using Otel to generate Trace A ----------------------------- |`\ -`|-- Application -> Using Otel to generate a sub Trace B------------------ |` +Finally, they switch to grpc-open-telemetry and finish the migration. +```agsl +|-- Application - Configured Otel standalone ----------------------------- | +|-- gRPC -> Using Otel to generate Trace A ----------------------------- | +|-- Application -> Using Otel to generate a sub Trace B------------------ | +``` -## Rational +## Rationale N/A ## Implementation From 6bc027e7c0a05f01b5758c59a3d22fe86f01f399 Mon Sep 17 00:00:00 2001 From: yifeizhuang Date: Tue, 19 Dec 2023 15:06:01 -0800 Subject: [PATCH 06/30] Apply suggestions from code review ejona's language suggestion Co-authored-by: Eric Anderson --- A72-open-telemetry-tracing.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/A72-open-telemetry-tracing.md b/A72-open-telemetry-tracing.md index 21344eefb..bc33649b6 100644 --- a/A72-open-telemetry-tracing.md +++ b/A72-open-telemetry-tracing.md @@ -154,7 +154,7 @@ single composite propagator that combines one or multiple `TextMapPropagator` for their client and server separately. This way, users can define their own migration path for context propagators in distributed components, see detailed discussion in the later session. Configuring gRPC OpenTelemetry with this -propagator when dealing with cross-cutting concerns during migration is +propagator when dealing with cross-process concerns during migration is straightforward and recommended. In the long term, community standardized propagators, e.g. W3C is more encouraged than `grpcTraceBinPropagator`. @@ -264,16 +264,16 @@ private interface GrpcBinarySetter { ``` -The `GrpcCommonSetter` adds an overloaded `set()` method to directly take `bytes[]` +The `GrpcCommonSetter` adds an overloaded `set()` method to directly take `byte[]` (Java and Go) to avoid extra base64 encoding. For the normal `set()` method it should handle both binary (`-bin`) header and ASCII header from any TextMap -propagators that users may config. +propagators that users may configure. The `GrpcCommonGetter` adds new method `getBinary()` for the optimized path for the same reason in Java and Go. Similarly, the normal `get()` method handles both binary header and TextMap propagators. #### Context Propagation APIs in C++ -C++ will also support the propagators API, because this imposes API +C++ will also support the propagators API, because this provides API uniformity among languages. Due to the language restriction, C++ can not take the optimization path to workaround lacking the binary propagator API. That means using propagators API with C++ needs base64 encoding @@ -392,7 +392,7 @@ The OpenCensus [shim](https://github.com/open-telemetry/opentelemetry-java/tree/ OpenTelemetry and OpenCensus dependencies to export trace spans from both frameworks, and keep the correct parent-child relationship. This is the recommended approach to migrate to OpenTelemetry in one binary gradually. Note that the in-binary migration and -cross-cutting concerns migration can be done in parallel. +cross-process migration can be done in parallel. The shim packages that bridge two libraries works as follows, considering the following migration scenarios example: From 5dbbc13a27c89da314dde44985a130d27043ef6b Mon Sep 17 00:00:00 2001 From: yifeizhuang Date: Thu, 4 Jan 2024 15:42:09 -0800 Subject: [PATCH 07/30] minor fix --- A72-open-telemetry-tracing.md | 119 +++++++++++++++++----------------- 1 file changed, 60 insertions(+), 59 deletions(-) diff --git a/A72-open-telemetry-tracing.md b/A72-open-telemetry-tracing.md index a1db84408..381d28a70 100644 --- a/A72-open-telemetry-tracing.md +++ b/A72-open-telemetry-tracing.md @@ -102,28 +102,28 @@ is design and implementation complete. We will add tracing functions in grpc-open-telemetry plugin, along with OpenTelemetry metrics [gRPC A66][A66]. Languages will keep using the same gRPC infrastructures, e.g. interceptor and -stream tracer to implement the feature. -We keep the grpc-census plugin to allow users who already depend on grpc-census to -continue using it for the newer grpc version and offer grace time for the migration. +stream tracer to implement the feature. +We keep the grpc-census plugin to allow users who already depend on grpc-census to +continue using it for newer grpc versions. In the new function we will produce the same tracing information as we produce for Census. But due to API differences between OpenCensus and OpenTelemetry, the -trace information has slight differences. +trace information is represented slightly differently. The OpenCensus `MessageEvent` fields maps to OpenTelemetry event attributes: -| OpenCensus Trace Message Event Fields | OpenTelemetry Event Attribute Key | -|---------------------------------------|-----------------------------------| -| Type | `message.event.type` | -| Message Id | `message.message.id` | -| Uncompressed message size | `message.event.size.uncompressed` | -| Compressed message size | `message.event.size.compressed` | +| OpenCensus Trace Message Event Fields | OpenTelemetry Trace Event Attribute Key | +|---------------------------------------|-----------------------------------------| +| Type | `message.event.type` | +| Message Id | `message.message.id` | +| Uncompressed message size | `message.event.size.uncompressed` | +| Compressed message size | `message.event.size.compressed` | -OpenCensus span annotation description maps to OpenTelemetry event name, attributes -keys are mapped to: +OpenCensus span annotation description maps to OpenTelemetry event name, and +annotation attributes keys are mapped to event attributes keys: -| OpenCensus Trace Annotation Attribute Key | OpenTelemetry Event Attribute Key | -|-------------------------------------------|-----------------------------------| -| `id` | `message.event.type` | -| `type` | `message.message.id` | +| OpenCensus Trace Annotation Attribute Key | OpenTelemetry Trace Event Attribute Key | +|-------------------------------------------|-----------------------------------------| +| `id` | `message.event.type` | +| `type` | `message.message.id` | And OpenTelemetry no longer has span end options as OpenCensus does. @@ -145,21 +145,21 @@ TextMap propagator does not show visible performance impact for C++, which is the most sensitive language to performance, based on internal micro benchmarking. Therefore, gRPC will only support `grpc-trace-bin` and TextMap propagators. -gRPC will expose a custom `grpcTraceBinPropagator` that implements `TextMapPropagator`. +gRPC will expose a custom `GrpcTraceBinPropagator` that implements `TextMapPropagator`. This grpc-provided propagator still uses the `grpc-trace-bin` header for context propagation. When using `grpc-trace-bin` the OpenCensus spanContext and -OpenTelemetry spanContext are identical, therefore a gRPC OpenCensus client can +OpenTelemetry spanContext are identical, therefore a gRPC OpenCensus client can speak with a gRPC OpenTelemetry server and vice versa. Users can provide a single composite propagator that combines one or multiple `TextMapPropagator` for their client and server separately. This way, users can define their own migration path for context propagators in distributed components, see detailed discussion in the later session. Configuring gRPC OpenTelemetry with this -propagator when dealing with cross-process concerns during migration is +propagator when dealing with cross-process concerns during migration is straightforward and recommended. In the long term, community -standardized propagators, e.g. W3C is more encouraged than `grpcTraceBinPropagator`. +standardized propagators, e.g. W3C is more encouraged than `GrpcTraceBinPropagator`. #### Propagator API in Java/Go -The pseudocode below demonstrates `grpcTraceBinPropagator` and the corresponding +The pseudocode below demonstrates `GrpcTraceBinPropagator` and the corresponding gRPC Getter/Setter with an optimization path. ```Java @@ -175,7 +175,7 @@ public class GrpcTraceBinPropagator implements TextMapPropagator { } else { // Slow path for C++. gRPC C++ does not have runtime type inspection, so we // encode bytes to String to comply with the TextMapSetter API. This code - // path is also used in the situation where GrpcTraceBinTextMapPropagator + // path is also used in the situation where GrpcTraceBinPropagator // is used with a TextMapSetter externally. setter.set(carrier, "grpc-trace-bin", Base64.getEncoder().encodeToString(value)); } @@ -187,9 +187,10 @@ public class GrpcTraceBinPropagator implements TextMapPropagator { if (textMapGetter instanceof GrpcCommonGetter) { //Fast path for Java/Go bytes = ((GrpcCommonGetter) textMapGetter).getBinary((Metadata) c, "grpc-trace-bin"); } else { - // Slow path for C++. gRPC C++ does not have type checking, so we decode String - // from TextMapGetter API to bytes. This code path applies to the situation - // where GrpcTraceBinTextMapPropagator is used with a TextMapGetter externally. + // Slow path for C++. gRPC C++ does not have runtime type inspection, so + // we decode String from TextMapGetter API to bytes. This code path applies + // to the situation where GrpcTraceBinPropagator is used with a TextMapGetter + // externally. String contextString = textMapGetter.get(c, "grpc-trace-bin"); bytes = Base64.getDecoder().decode(contextString); } @@ -200,13 +201,13 @@ public class GrpcTraceBinPropagator implements TextMapPropagator { ``` -The `grpcTraceBinPropagator` should be compatible with any Getter/Setter, but -internally in gRPC, in Java and Go we implement a special gRPC Getter/Setter -that uses an optimization path to work around the lack of binary propagator API -and thus avoid base64 (de)encoding when passing data between API interfaces. -This special gRPC Getter/Setter will also be responsible for handling other -propagators that users will configure with gRPC OpenTelemetry (e.g. w3c), -see the pseudocode below. +The `GrpcTraceBinPropagator` should be compatible with any Getter/Setter, but +internally in gRPC, in Java and Go we implement a special gRPC Getter/Setter +that uses an optimization path to work around the lack of binary propagator API +and thus avoid base64 (de)encoding when passing data between API interfaces. +This special gRPC Getter/Setter will also be responsible for handling other +propagators that users will configure with gRPC OpenTelemetry (e.g. W3C), +see the pseudocode below. ```Java @Internal @@ -222,12 +223,12 @@ class GrpcCommonSetter implements TextMapSetter, GrpcBinarySetter { public String get(@Nullable Metadata carrier, String key) { if (key.endsWith("-bin")) { // Slow path for C++: return string encoded from bytes. Later we decode to - // bytes in GrpcTraceBinTextMapPropagator.extract(). + // bytes in GrpcTraceBinPropagator.extract(). byte[] value = carrier.get(Metadata.Key.of(key, BINARY_BYTE_MARSHALLER)); return Base64.getEncoder().encodeToString(value); } else { - // Used by other TextMap propagators, e.g. w3c. + // Used by other TextMap propagators, e.g. W3C. return carrier.get(Metadata.Key.of(key, ASCII_STRING_MARSHALLER)); } } @@ -255,11 +256,11 @@ class GrpcCommonGetter implements TextMapGetter { } } -// This interface will be implemented by gRPCCommonSetter/Getter as an optimization path -// to avoid base64 encoding between TextMap APIs due to lack of -// OpenTelemetry binary propagator API. +// This interface will be implemented by gRPCCommonSetter/Getter as an optimization +// path to avoid base64 encoding between TextMap APIs due to lack of +// OpenTelemetry binary propagator API. Not for C++. private interface GrpcBinarySetter { - void set(Metadata header, String key, byte[] value); + void set(Metadata header, String key, byte[] value); } ``` @@ -268,9 +269,9 @@ The `GrpcCommonSetter` adds an overloaded `set()` method to directly take `byte[ (Java and Go) to avoid extra base64 encoding. For the normal `set()` method it should handle both binary (`-bin`) header and ASCII header from any TextMap propagators that users may configure. -The `GrpcCommonGetter` adds new method `getBinary()` for the optimized path for -the same reason in Java and Go. Similarly, the normal `get()` method handles both -binary header and TextMap propagators. +The `GrpcCommonGetter` in Java and Go adds a new method `getBinary()` for the +optimized path for the same reason. Similarly, the normal `get()` method handles +both binary headers and TextMap propagators. #### Context Propagation APIs in C++ C++ will also support the propagators API, because this provides API @@ -284,20 +285,20 @@ This is a faster way that avoids paying for the performance cost due to string/binary encoding between the propagator and the getter/setter. We use this strategy to balance between API simplicity and performance efficiency. The two APIs C++ will support for the context propagation are: -* If `grpcTraceBinPropagator` is configured, take a slower path in the pseudocode +* If `GrpcTraceBinPropagator` is configured, take a slower path in the pseudocode described above. * If explicitly configured, gRPC will directly use `Metadata.get()` and `Metadata.put()` APIs on the `grpc-trace-bin` header. No TextMapPropagator API and TextMapSetter/Getter will be involved. This is a faster path and mitigates performance concerns due to base64 encoding. -TODO: add pseudocode here in C++ for `grpcTraceBinPropagator`,`GrpcCommonSetter` +TODO: add pseudocode here in C++ for `GrpcTraceBinPropagator`,`GrpcCommonSetter` and `GrpcCommonSetter`. The `GrpcCommonSetter.set()` and `GrpcCommonGetter.get()` method in C++ should handle both binary (`-bin`) header (e.g. `grpc-trace-bin`) and ASCII header from other TextMap -propagators that users configure into gRPC OpenTelemetry, e.g. w3c. +propagators that users configure into gRPC OpenTelemetry, e.g. W3C. ### gRPC OpenTelemetry Tracing API This section talks about enabling and configuring OpenTelemetry tracing. @@ -312,7 +313,7 @@ In Java, it will be part of global interceptors, so that the interceptors are managed in a more sustainable way and user-friendly. As a prerequisite, the stream tracer factory API will be stabilized. OpenTelemetryModule will be created with an OpenTelemetryAPI instance passing in for necessary configurations. -Users can also rely on SDK autoconfig extension that configure the sdk object +Users can also rely on SDK autoconfig extension that configures the sdk object through environmental variables or Java system properties, then obtain the sdk object passed in to gRPC. @@ -329,7 +330,7 @@ OpenTelemetry openTelemetry = OpenTelemetrySdk.builder() .setTracerProvider(sdkTracerProvider) .setMeterProvider(...) .setPropagators( - ContextPropagators.create(GrpcTraceBinTextMapPropagator.getInstance())) + ContextPropagators.create(GrpcTraceBinPropagator.getInstance())) .build(); // Alternatively, use auto configuration: @@ -365,7 +366,7 @@ TODO: add Go API. ### Migrate to OpenTelemetry: Cross-process Networking Concerns When users first introduce gRPC OpenTelemetry, for the time window when the gRPC client and server have mixed plugins of OpenTelemetry and OpenCensus, -it is encouraged to use `grpcTraceBinPropagator` that propagates +it is encouraged to use `GrpcTraceBinPropagator` that propagates `grpc-trace-bin` header for the migration. Using the same header greatly simplifies rollout. A `grpc-trace-bin` formatter implementation for OpenTelemetry is needed in each language, @@ -373,10 +374,10 @@ which can be similar to the OpenCensus implementation. Go already has community support for that. After migration period, users have the flexibility to switch to other propagators. -OpenTelemetry and its extension packages support multiple text map propagators, -e.g. W3C trace context or b3. The gRPC OpenTelemetry API allows specifying +OpenTelemetry and its extension packages support multiple text map propagators. +The gRPC OpenTelemetry API allows specifying multiple propagators: either public standard ones or custom propagators that -implement the OpenTelemetry propagators API interface. The API composites the +implement the OpenTelemetry propagator API interface. The API composites the propagators and gRPC puts all the propagator data into the wire through metadata. This allows users to easily migrate a group of applications with an old propagator to a new propagator. An example migration path can be: @@ -387,12 +388,12 @@ a new propagator. An example migration path can be: ### Migrate to OpenTelemetry: In Binary The OpenCensus [shim](https://github.com/open-telemetry/opentelemetry-java/tree/main/opencensus-shim) (currently available in Java, Go, Python) allows binaries that have a mix of -OpenTelemetry and OpenCensus dependencies to export trace spans from both frameworks, -and keep the correct parent-child relationship. This is the recommended approach to migrate -to OpenTelemetry in one binary gradually. Note that the in-binary migration and -cross-process migration can be done in parallel. +OpenTelemetry and OpenCensus dependencies to export trace spans from both +frameworks, and keep the correct parent-child relationship. This is the +recommended approach to migrate to OpenTelemetry in one binary gradually. +Note that the in-binary migration and cross-process migration can be done in parallel. -The shim packages that bridge two libraries works as follows, considering the +The shim package that bridges two libraries works as follows, considering the following migration scenarios example: ```agsl From 7b48df4075f1325dc96358116f6dcb4c5b41f156 Mon Sep 17 00:00:00 2001 From: yifeizhuang Date: Mon, 8 Jan 2024 18:02:49 -0800 Subject: [PATCH 08/30] Meeting AI 01/08/2024 1. [done] Add that OpenTelemetry module should document that things/configurations will change as OT changes. 2. [done] Move C++ fast path to rationale. Capture two alternatives about the fast path that may have in the future. 3. [done] Update C++ API about adding a new method in OpenTelemetryPluginBuilder, and reference to the current code and the gRFC. 4. [todo] complete the C++ GrpcTraceBinPropagator and GrpcTextMapCarrier --- A72-open-telemetry-tracing.md | 100 ++++++++++++++++++++++++---------- 1 file changed, 70 insertions(+), 30 deletions(-) diff --git a/A72-open-telemetry-tracing.md b/A72-open-telemetry-tracing.md index 381d28a70..a6634b43b 100644 --- a/A72-open-telemetry-tracing.md +++ b/A72-open-telemetry-tracing.md @@ -143,7 +143,7 @@ propagator API in TextMap format with an optimization path (Go and Java) to work around the lack of binary propagator API to support `grpc-trace-bin`. In fact, TextMap propagator does not show visible performance impact for C++, which is the most sensitive language to performance, based on internal micro benchmarking. -Therefore, gRPC will only support `grpc-trace-bin` and TextMap propagators. +Therefore, gRPC will only support propagating `grpc-trace-bin` in TextMap propagator. gRPC will expose a custom `GrpcTraceBinPropagator` that implements `TextMapPropagator`. This grpc-provided propagator still uses the `grpc-trace-bin` header for context @@ -274,28 +274,44 @@ optimized path for the same reason. Similarly, the normal `get()` method handles both binary headers and TextMap propagators. #### Context Propagation APIs in C++ -C++ will also support the propagators API, because this provides API -uniformity among languages. Due to the language restriction, -C++ can not take the optimization path to workaround lacking the binary -propagator API. That means using propagators API with C++ needs base64 encoding -and therefore is slower compared with just using metadata API. However, C++ will -have an API that enables adding `grpc-trace-bin` to the metadata directly, without -using the propagators API. -This is a faster way that avoids paying for the performance cost due to -string/binary encoding between the propagator and the getter/setter. -We use this strategy to balance between API simplicity and performance efficiency. -The two APIs C++ will support for the context propagation are: -* If `GrpcTraceBinPropagator` is configured, take a slower path in the pseudocode -described above. -* If explicitly configured, gRPC will directly use `Metadata.get()` and `Metadata.put()` -APIs on the `grpc-trace-bin` header. No TextMapPropagator API and TextMapSetter/Getter -will be involved. This is a faster path and mitigates performance concerns due -to base64 encoding. +C++ will also support propagator APIs to provides API +uniformity among languages, as well as getting all the benefits of propagator APIs +mentioned before. Due to the language restriction, C++ can not take the optimization +path to workaround lacking the binary propagator API. That means using propagators +API with C++ needs base64 encoding and therefore is slower compared with just +using metadata API. + +TODO: add pseudocode here in C++ for `GrpcTraceBinPropagator`,`GrpcTextMapCarrier`. -TODO: add pseudocode here in C++ for `GrpcTraceBinPropagator`,`GrpcCommonSetter` -and `GrpcCommonSetter`. +```C++ +class GrpcTraceBinPropagator : public TextMapPropagator { + public: + void Inject(TextMapCarrier &carrier, const context::Context &context) { + // Slow path for C++. gRPC C++ does not have runtime type inspection, so we + // encode bytes to String to comply with the TextMapSetter API. This code + // path is also used in the situation where GrpcTraceBinPropagator + // is used with a TextMapSetter externally. + // TODO: add implementation + } + context::Context Extract(const TextMapCarrier &carrier, context::Context &context) { + // TODO: add implementaiton + } +} +``` -The `GrpcCommonSetter.set()` and `GrpcCommonGetter.get()` method in C++ +```C++ +class GrpcTextMapCarrier : public TextMapCarrier { + public: + nostd::string_view Get(nostd::string_view key) { + // TODO: add implementation + } + void Set(nostd::string_view key, nostd::string_view value) { + // TODO: add implementation + } +} +``` + +The `GrpcTextMapCarrier.set()` and `GrpcTextMapCarrier.get()` method in C++ should handle both binary (`-bin`) header (e.g. `grpc-trace-bin`) and ASCII header from other TextMap propagators that users configure into gRPC OpenTelemetry, e.g. W3C. @@ -337,7 +353,8 @@ OpenTelemetry openTelemetry = OpenTelemetrySdk.builder() // OpenTelemetry openTelemetry = // AutoConfiguredOpenTelemetrySdk.getOpenTelemetrySdk(). -// OpenTelemetryModule.getInstance() will be using GlobalOpenTelemetry +// Create a module that hosts tracing infrastructures. Should document that +// the module implementation may change as OpenTelemetry evolves. OpenTelemetryModule otModule = OpenTelemetryModule.getInstance(openTelemetry); GlobalInterceptors.setInterceptors( @@ -347,15 +364,22 @@ GlobalInterceptors.setInterceptors( ``` #### C++ -In C++, it will be a method that mirrors OpenCensus API. - -TODO: update C++ API. +In C++, we will add a new method in `OpenTelemetryPluginBuilder`, see [gRPC A66][A66]. ```C++ -// Enable OpenTelemetry based tracing. Similar to -// RegisterOpenCensusPlugin(). TracerProvider is configured via sdk separately. -void RegisterOpenTelemetryTracingPlugin(); - +class OpenTelemetryPluginBuilder { + public: + OpenTelemetryPluginBuilder(); + // If `SetMeterProvider()` is not called, no metrics are collected. + OpenTelemetryPluginBuilder& SetMeterProvider( + std::shared_ptr meter_provider); ++ // Set one or multiple propagators for span context propagation, e.g. ++ // GrpcTraceBinPropagator or community standard ones like W3C, etc. ++ OpenTelemetryPluginBuilder& SetPropagator( ++ std::shared_ptr ++ new grpc::GrpcTraceBinPropagator()); +... +} ``` #### Go @@ -424,7 +448,23 @@ Finally, they switch to grpc-open-telemetry and finish the migration. ``` ## Rationale -N/A +C++ will not have the optimization path in its `GrpcTraceBinPropagator` API. We +considered to have an API that enables adding `grpc-trace-bin` to the metadata +directly, without using the propagators API. This will be a faster way that +avoids paying for the performance cost due to +string/binary encoding between the propagator and the getter/setter. +The two APIs C++ will support for the context propagation are: +* If `GrpcTraceBinPropagator` is configured, take a slower path in the pseudocode + described above. +* If explicitly configured, gRPC will directly use `Metadata.get()` and `Metadata.put()` + APIs on the `grpc-trace-bin` header. No TextMapPropagator API and TextMapSetter/Getter + will be involved. This is a faster path and mitigates performance concerns due + to base64 encoding. + +Alternatively, we can enable the fast path within C++ `GrpcTraceBinPropagator` +instead of explicitly configure on the OpenTelemetry plugin. However, for the initial +implementation we don't have any fast path support. We leave it for the future when there +are use cases or performance concerns users may have. ## Implementation Will be implemented in Java, C++, Go and Python. From 8631cee6c91954b52ec60b954473c166dc6d4ba1 Mon Sep 17 00:00:00 2001 From: Yash Tibrewal Date: Wed, 17 Jan 2024 03:46:08 +0000 Subject: [PATCH 09/30] Fill up C++ sections --- A72-open-telemetry-tracing.md | 148 +++++++++++++++++++++++++--------- 1 file changed, 110 insertions(+), 38 deletions(-) diff --git a/A72-open-telemetry-tracing.md b/A72-open-telemetry-tracing.md index a6634b43b..78644402c 100644 --- a/A72-open-telemetry-tracing.md +++ b/A72-open-telemetry-tracing.md @@ -1,12 +1,12 @@ A72: OpenTelemetry Tracing ---- -* Author(s): [Yifei Zhuang](https://github.com/YifeiZhuang) +* Author(s): [Yifei Zhuang](https://github.com/YifeiZhuang), [Yash Tibrewal](https://github.com/yashykt) * Approver: [Eric Anderson](https://github.com/ejona86) * Reviewers: [Mark Roth](https://github.com/markdroth), [Doug Fawley](https://github.com/dfawley), [Feng Li](https://github.com/fengli79) * Status: Ready for Implementation * Implemented in: -* Last updated: 2023-07 +* Last updated: 2024-01 * Discussion at: https://groups.google.com/g/grpc-io/c/e_ByaRmtJak ## Abstract @@ -281,40 +281,113 @@ path to workaround lacking the binary propagator API. That means using propagato API with C++ needs base64 encoding and therefore is slower compared with just using metadata API. -TODO: add pseudocode here in C++ for `GrpcTraceBinPropagator`,`GrpcTextMapCarrier`. +```C++ +std::unique_ptr MakeGrpcTraceBinTextMapPropagator(); +``` + +The following shows a sketch on what the internal implementation details of this API would look within gRPC C++/Core. ```C++ -class GrpcTraceBinPropagator : public TextMapPropagator { - public: - void Inject(TextMapCarrier &carrier, const context::Context &context) { - // Slow path for C++. gRPC C++ does not have runtime type inspection, so we - // encode bytes to String to comply with the TextMapSetter API. This code - // path is also used in the situation where GrpcTraceBinPropagator + +namespace grpc { +namespace internal { + +class GrpcTraceBinTextMapPropagator + : public opentelemetry::context::TextMapPropagator { + public: + void Inject(opentelemetry::context::TextMapCarrier& carrier, + const opentelemetry::context::Context& context) { + // Slow path for C++. gRPC C++ does not have runtime type inspection, so we + // encode bytes to String to comply with the TextMapSetter API. This code + // path is also used in the situation where GrpcTraceBinTextMapPropagator // is used with a TextMapSetter externally. - // TODO: add implementation + auto span_context = opentelemetry::trace::GetSpan(context)->GetContext(); + if (!span_context.IsValid()) { + return; + } + carrier.Set( + "grpc-trace-bin", + absl::Base64Escape( + absl::string_view(SpanContextToGrpcTraceBinHeader(span_context)) + .data()), + kGrpcTraceBinHeaderLen); } - context::Context Extract(const TextMapCarrier &carrier, context::Context &context) { - // TODO: add implementaiton - } -} -``` -```C++ -class GrpcTextMapCarrier : public TextMapCarrier { - public: + context::Context Extract(const context::propagation::TextMapCarrier& carrier, + opentelemetry::context::Context& context) { + return trace::SetSpan( + context, nostd::shared_ptr sp(new DefaultSpan( + GrpcTraceBinHeaderToSpanContext(absl::Base64Unescape( + carrier.Get("grpc-trace-bin")))))); + } + + private: + constexpr int kGrpcTraceBinHeaderLen = 29; + + std::array SpanContextToGrpcTraceBinHeader( + const opentelemetry::trace::SpanContext& ctx) { + std::array header; + header[0] = 0; + header[1] = 0; + ctx.trace_id().CopyBytesTo(&header[2], 16); + header[18] = 1; + ctx.span_id().CopyBytesTo(&header[19], 8); + header[27] = 2; + header[28] = ctx.trace_flags().flags(); + return header; + } + + opentelemetry::trace::SpanContext GrpcTraceBinHeaderToSpanContext( + nostd::string_view header) { + if (header.size() != kGrpcTraceBinHeaderLen || header[0] != 0 || + header[1] != 0 || header[18] != 1 || header[27] != 2) { + return SpanContext::GetInvalid(); + } + return SpanContext(TraceId(&header[2], 16), SpanId(&header[19], 8), + TraceFlags(header[28]), /*is_remote*/ true); + } +}; + +class GrpcTextMapCarrier : public opentelemetry::context::TextMapCarrier { + public: + GrpcTextMapCarrier(grpc_metadata_batch* metadata) : metadata_(metadata) {} + nostd::string_view Get(nostd::string_view key) { - // TODO: add implementation + if (key == "grpc-trace-bin") { + return absl::Base64Escape(metadata_->GetStringValue(key).value_or("")); + } else if (absl::EndsWith(key, "-bin")) { + // TODO() : Unclear how to handle this + } + return metadata_->GetStringValue(key); } + void Set(nostd::string_view key, nostd::string_view value) { - // TODO: add implementation + if (key == "grpc-trace-bin") { + metadata_->Set( + grpc_core::GrpcTraceBinMetadata(), + grpc_core::Slice::FromCopiedString(absl::Base64Unescape(value))); + } else if (absl::EndsWith(key, "-bin")) { + // TODO() : Unclear how to handle this + } else { + // A propagator other than GrpcTraceBinTextMapPropagator was used. + metadata_->Append(key, grpc_core::Slice::FromCopiedString(value)); + } } -} -``` -The `GrpcTextMapCarrier.set()` and `GrpcTextMapCarrier.get()` method in C++ -should handle both binary (`-bin`) header -(e.g. `grpc-trace-bin`) and ASCII header from other TextMap -propagators that users configure into gRPC OpenTelemetry, e.g. W3C. + private: + grpc_metadata_batch* metadata_; +}; + +} // namespace internal + +std::unique_ptr +MakeGrpcTraceBinTextMapPropagator() { + return std::make_unique(); +} + +} // namespace grpc + +``` ### gRPC OpenTelemetry Tracing API This section talks about enabling and configuring OpenTelemetry tracing. @@ -364,22 +437,21 @@ GlobalInterceptors.setInterceptors( ``` #### C++ -In C++, we will add a new method in `OpenTelemetryPluginBuilder`, see [gRPC A66][A66]. +The following new methods will be added in `OpenTelemetryPluginBuilder`. ```C++ class OpenTelemetryPluginBuilder { public: - OpenTelemetryPluginBuilder(); - // If `SetMeterProvider()` is not called, no metrics are collected. - OpenTelemetryPluginBuilder& SetMeterProvider( - std::shared_ptr meter_provider); -+ // Set one or multiple propagators for span context propagation, e.g. -+ // GrpcTraceBinPropagator or community standard ones like W3C, etc. -+ OpenTelemetryPluginBuilder& SetPropagator( -+ std::shared_ptr -+ new grpc::GrpcTraceBinPropagator()); -... -} + // If `SetTracerProvider()` is not called, no traces are collected. + OpenTelemetryPluginBuilder& SetTracerProvider( + std::shared_ptr + tracer_provider); + // Set one or multiple text map propagators for span context propagation, e.g. + // GrpcTraceBinPropagator or community standard ones like W3C, etc. + OpenTelemetryPluginBuilder& SetTextMapPropagator( + std::unique_ptr + text_map_propagator); +}; ``` #### Go From 528dd82ff890923846816a4ecc38d2e5a282b5aa Mon Sep 17 00:00:00 2001 From: Yash Tibrewal Date: Thu, 18 Jan 2024 00:29:55 +0000 Subject: [PATCH 10/30] Reviewer comments --- A72-open-telemetry-tracing.md | 33 +++++++++++++++++---------------- 1 file changed, 17 insertions(+), 16 deletions(-) diff --git a/A72-open-telemetry-tracing.md b/A72-open-telemetry-tracing.md index 78644402c..8d58fe0df 100644 --- a/A72-open-telemetry-tracing.md +++ b/A72-open-telemetry-tracing.md @@ -274,15 +274,17 @@ optimized path for the same reason. Similarly, the normal `get()` method handles both binary headers and TextMap propagators. #### Context Propagation APIs in C++ -C++ will also support propagator APIs to provides API -uniformity among languages, as well as getting all the benefits of propagator APIs -mentioned before. Due to the language restriction, C++ can not take the optimization -path to workaround lacking the binary propagator API. That means using propagators -API with C++ needs base64 encoding and therefore is slower compared with just -using metadata API. +C++ will also support propagator APIs to provides API uniformity among +languages. Since gRPC C++ avoids RTTI, it can not use the same optimization path +as Java/Go. This will result in an extra base64 encoding/decoding step to +satisfy `TextMapPropagator` requirement that the key/value pair be a valid HTTP +field. There are possible optimizations C++ might pursue in the future, for +example, providing an explicit knob on `GrpcTraceBinTextMapPropagator` that +assumes that this propagator is being used with gRPC and can hence skirt `TextMapPropagator` compatibility requirements. ```C++ -std::unique_ptr MakeGrpcTraceBinTextMapPropagator(); +std::unique_ptr +MakeGrpcTraceBinTextMapPropagator(); ``` The following shows a sketch on what the internal implementation details of this API would look within gRPC C++/Core. @@ -297,10 +299,6 @@ class GrpcTraceBinTextMapPropagator public: void Inject(opentelemetry::context::TextMapCarrier& carrier, const opentelemetry::context::Context& context) { - // Slow path for C++. gRPC C++ does not have runtime type inspection, so we - // encode bytes to String to comply with the TextMapSetter API. This code - // path is also used in the situation where GrpcTraceBinTextMapPropagator - // is used with a TextMapSetter externally. auto span_context = opentelemetry::trace::GetSpan(context)->GetContext(); if (!span_context.IsValid()) { return; @@ -356,7 +354,9 @@ class GrpcTextMapCarrier : public opentelemetry::context::TextMapCarrier { if (key == "grpc-trace-bin") { return absl::Base64Escape(metadata_->GetStringValue(key).value_or("")); } else if (absl::EndsWith(key, "-bin")) { - // TODO() : Unclear how to handle this + // It's unclear how we want to handle other headers that end with "-bin". + // Return an empty string for now. + return ""; } return metadata_->GetStringValue(key); } @@ -367,7 +367,9 @@ class GrpcTextMapCarrier : public opentelemetry::context::TextMapCarrier { grpc_core::GrpcTraceBinMetadata(), grpc_core::Slice::FromCopiedString(absl::Base64Unescape(value))); } else if (absl::EndsWith(key, "-bin")) { - // TODO() : Unclear how to handle this + // It's unclear how we want to handle other headers that end with "-bin". + // Do nothing for now. + return; } else { // A propagator other than GrpcTraceBinTextMapPropagator was used. metadata_->Append(key, grpc_core::Slice::FromCopiedString(value)); @@ -444,10 +446,9 @@ class OpenTelemetryPluginBuilder { public: // If `SetTracerProvider()` is not called, no traces are collected. OpenTelemetryPluginBuilder& SetTracerProvider( - std::shared_ptr - tracer_provider); + std::shared_ptr tracer_provider); // Set one or multiple text map propagators for span context propagation, e.g. - // GrpcTraceBinPropagator or community standard ones like W3C, etc. + // GrpcTraceBinTextMapPropagator or community standard ones like W3C, etc. OpenTelemetryPluginBuilder& SetTextMapPropagator( std::unique_ptr text_map_propagator); From 517f340ccfd725d2d931a4177c9add1ef645062e Mon Sep 17 00:00:00 2001 From: yifeizhuang Date: Thu, 18 Jan 2024 16:33:48 -0800 Subject: [PATCH 11/30] trace info, custom binary header validation --- A72-open-telemetry-tracing.md | 76 +++++++++++++++++++---------------- 1 file changed, 41 insertions(+), 35 deletions(-) diff --git a/A72-open-telemetry-tracing.md b/A72-open-telemetry-tracing.md index 8d58fe0df..a5d679f6a 100644 --- a/A72-open-telemetry-tracing.md +++ b/A72-open-telemetry-tracing.md @@ -36,13 +36,11 @@ and OpenCensus's binary format for (de)serialization. The header name is unique from other census library propagators to differentiate with the application’s tracing instrumentation. -The following tracing information during an RPC lifecycle is captured: +The following tracing information during an RPC lifecycle should be captured: At the client, on parent span: * When the call is started, annotate name resolution completed if the RPC had name resolution delay. -* When the uncompressed size of some outbound data is revealed, annotate seq no., -type(Received) and wire message size. * When the call is closed, end the parent span with RPC status. On attempt span: @@ -51,35 +49,37 @@ number of preceding attempts for the RPC, and attribute `transparent-retry` that shows whether stream is a transparent retry. * When the stream is created on transport, annotate delayed load balancer pick complete, if any. -* When an outbound message has been sent, add a message event to capture seq no., -type(SENT) and wire size. -* When an inbound message has been received from the transport, add a message -event to capture seq no., type(Received) and uncompressed message size. +* When an outbound message has been sent, add message events to capture seq no., +type(SENT), uncompressed message size, and compressed message size if any +compression. The seq no. is a sequence of integer numbers starting from 0 +to identify sent messages within the stream. The size is the total attempt message +bytes without encryption, not including grpc or transport framing bytes. +* When an inbound message has been received from the transport, add message +events to capture seq no., type(Received), wire message size, and uncompressed +message size if any decompression. The seq no. is a sequence of integer numbers +starting from 0 to identify received messages within the stream. * When the stream is closed, end the attempt span with RPC status. At the server: -* When an outbound message has been sent, add a message event to capture seq no., -type(SENT) and size. -* When an inbound message has been read from the transport, add a message event -to capture seq no., type(Received) and size. -* When the uncompressed size of some inbound data is revealed, annotate seq no., -type(Received) and wire message size. +* When an outbound message has been sent, add message events to capture seq no., +type(SENT) and uncompressed message size, and compressed message size if any compression. +* When an inbound message has been read from the transport, add message events +to capture seq no., type(Received), wire message size, and uncompressed message size, +if any decompression. * When the stream is closed, end the span with RPC status. +Note that C++ is missing the seq no. information. And Java has issue of reporting +decompressed message size upon receiving messages, +as a workaround, on the client parent span and server span: +* When the uncompressed size of some outbound data is revealed, annotate seq no., +type(Received) and uncompressed message size. + ### gRPC Census API The APIs to enable gRPC tracing are different between languages, e.g. in grpc-java it is zero-configuration: as long as the grpc-census dependency exists in the classpath, the traces are automatically generated. In C++, there is an API for users to call to enable tracing. In Go, it is exposed via stream tracers. -### gRPC GCP Observability -Following the census tracing instrumentation, gRPC supports exporting traces -to GCP Stackdriver for visualization and analysis, see [user guide][grpc-observability-public-doc]. -We distinguish and exclude gRPC GCP observability from this design: gRPC GCP -observability is about exporting data while this design is about instrumentation. -Migrating gRPC GCP observability to OpenTelemetry is a future project that depends -on this work. - ### Related Proposals and Documents: * [gRFC L29: C++ API Changes for OpenCensus Integration][L29] * [gRFC A45: Exposing OpenCensus Metrics and Tracing for gRPC retry][A45] @@ -98,17 +98,19 @@ where during migration one will use OpenCensus and the other will use OpenTeleme All related APIs here are experimental until OpenTelemetry metrics [gRPC A66][A66] is design and implementation complete. -### New Function In OpenTelemetry Plugin +### Tracing Function In OpenTelemetry Plugin We will add tracing functions in grpc-open-telemetry plugin, along with OpenTelemetry metrics [gRPC A66][A66]. Languages will keep using the same gRPC infrastructures, e.g. interceptor and stream tracer to implement the feature. We keep the grpc-census plugin to allow users who already depend on grpc-census to continue using it for newer grpc versions. -In the new function we will produce the same tracing information as we produce +In the new tracing function we will produce the same tracing information as we produce for Census. But due to API differences between OpenCensus and OpenTelemetry, the trace information is represented slightly differently. -The OpenCensus `MessageEvent` fields maps to OpenTelemetry event attributes: +In the new tracing function, the client will add `Event`s (name: +`Outbound message sent` and `Inbound message read`) with corresponding attributes, +mapped from OpenCensus `MessageEvent` fields: | OpenCensus Trace Message Event Fields | OpenTelemetry Trace Event Attribute Key | |---------------------------------------|-----------------------------------------| @@ -122,8 +124,8 @@ annotation attributes keys are mapped to event attributes keys: | OpenCensus Trace Annotation Attribute Key | OpenTelemetry Trace Event Attribute Key | |-------------------------------------------|-----------------------------------------| -| `id` | `message.event.type` | -| `type` | `message.message.id` | +| `type` | `message.event.type` | +| `id` | `message.message.id` | And OpenTelemetry no longer has span end options as OpenCensus does. @@ -216,17 +218,19 @@ class GrpcCommonSetter implements TextMapSetter, GrpcBinarySetter, GrpcBinarySetter { @Override - public String get(@Nullable Metadata carrier, String key) { - if (key.endsWith("-bin")) { + public String get(@Nullable Metadata carrier, String key) { + if (key.equals("grpc-trace-bin")) { // Slow path for C++: return string encoded from bytes. Later we decode to // bytes in GrpcTraceBinPropagator.extract(). byte[] value = carrier.get(Metadata.Key.of(key, BINARY_BYTE_MARSHALLER)); return Base64.getEncoder().encodeToString(value); + } else if (key.endsWith("-bin")) { + logger.log(Level.ERROR, "Binary propagator other than GrpcTraceBinPropagator is not supported."); } else { // Used by other TextMap propagators, e.g. W3C. return carrier.get(Metadata.Key.of(key, ASCII_STRING_MARSHALLER)); @@ -251,7 +257,7 @@ class GrpcCommonGetter implements TextMapGetter { // Add a new method to optimize the TextMap propagator to avoid base64 encoding. @Override public byte[] getBinary(@Nullable Metadata carrier, String key) { - assert key.endsWith("bin"); + assert key.equals("grpc-trace-bin"); return carrier.get(Metadata.Key.of(key, BINARY_BYTE_MARSHALLER)); } } @@ -354,8 +360,9 @@ class GrpcTextMapCarrier : public opentelemetry::context::TextMapCarrier { if (key == "grpc-trace-bin") { return absl::Base64Escape(metadata_->GetStringValue(key).value_or("")); } else if (absl::EndsWith(key, "-bin")) { - // It's unclear how we want to handle other headers that end with "-bin". - // Return an empty string for now. + // Maybe ok to support a custom binary propagator. Needs based64 encoding + // validation if so. Not for now. + gpr_log(GPR_ERROR, "Binary propagator other than GrpcTraceBinPropagator is not supported."); return ""; } return metadata_->GetStringValue(key); @@ -367,8 +374,7 @@ class GrpcTextMapCarrier : public opentelemetry::context::TextMapCarrier { grpc_core::GrpcTraceBinMetadata(), grpc_core::Slice::FromCopiedString(absl::Base64Unescape(value))); } else if (absl::EndsWith(key, "-bin")) { - // It's unclear how we want to handle other headers that end with "-bin". - // Do nothing for now. + gpr_log(GPR_ERROR, "Binary propagator other than GrpcTraceBinPropagator is not supported."); return; } else { // A propagator other than GrpcTraceBinTextMapPropagator was used. From 8a0f8f7359b742fa83a12c0e92342f1ef904b4f1 Mon Sep 17 00:00:00 2001 From: yifeizhuang Date: Mon, 18 Mar 2024 17:36:11 -0700 Subject: [PATCH 12/30] re-structure the content --- A72-open-telemetry-tracing.md | 309 +++++++++++++++++----------------- 1 file changed, 159 insertions(+), 150 deletions(-) diff --git a/A72-open-telemetry-tracing.md b/A72-open-telemetry-tracing.md index a5d679f6a..ed6d83d12 100644 --- a/A72-open-telemetry-tracing.md +++ b/A72-open-telemetry-tracing.md @@ -12,10 +12,12 @@ A72: OpenTelemetry Tracing ## Abstract This proposal adds support for OpenTelemetry tracing and suggests migration paths away from OpenCensus tracing. Discussions include: -* Context propagation between a gRPC client and server during migration. -* Migrate a gRPC involved software binary that has both OpenTelemetry and -OpenCensus dependency. * The API surface to enable and configure OpenTelemetry tracing. +* Context propagation between a gRPC client and server. +* Migration path from gRPC OpenCensus to OpenTelemetry, considering: + 1) The cross-process concerns during migration. + 2) In-binary migration for a gRPC involved software that has both OpenTelemetry and + OpenCensus dependency. Note that stats and logging are out of scope. @@ -36,98 +38,125 @@ and OpenCensus's binary format for (de)serialization. The header name is unique from other census library propagators to differentiate with the application’s tracing instrumentation. -The following tracing information during an RPC lifecycle should be captured: +### Related Proposals and Documents: +* [gRFC L29: C++ API Changes for OpenCensus Integration][L29] +* [gRFC A45: Exposing OpenCensus Metrics and Tracing for gRPC retry][A45] +* [Microservices observability overview][grpc-observability-public-doc] +* [gRFC A66: OpenTelemetry Metrics][A66] + +## Proposal +### gRPC OpenTelemetry Tracing API +We will add tracing functions in grpc-open-telemetry plugin, along with OpenTelemetry +metrics [gRPC A66][A66]. Languages will keep using gRPC infrastructures, e.g. interceptor and +stream tracer to implement the feature, the same as Census. + +The OpenTelemetry API will coexist with the OpenCensus API. Only one +`grpc-trace-bin` header will be sent for a single RPC as long as only one of +OpenTelemetry or OpenCensus is enabled for the channel. The APIs to enable and +configure OpenTelemetry tracing are different among languages due to different +underlying infrastructures. + +#### Java +In Java, it will be part of global interceptors, so that the interceptors are +managed in a more sustainable way and user-friendly. As a prerequisite, the stream +tracer factory API will be stabilized. OpenTelemetryModule will be created with +an OpenTelemetryAPI instance passing in for necessary configurations. +Users can also rely on SDK autoconfig extension that configures the sdk object +through environmental variables or Java system properties, then obtain the sdk +object passed in to gRPC. + +```Java +// Construct OpenTelemetry to be passed to gRPC OpenTelemetry module for +// trace and metrics configurations. +SdkTracerProvider sdkTracerProvider = SdkTracerProvider + .builder() + .addSpanProcessor( + BatchSpanProcessor.builder(exporter).build()) + .build(); + +OpenTelemetry openTelemetry = OpenTelemetrySdk.builder() + .setTracerProvider(sdkTracerProvider) + .setMeterProvider(...) + .setPropagators( + ContextPropagators.create(GrpcTraceBinPropagator.getInstance())) + .build(); + +// Alternatively, use auto configuration: +// OpenTelemetry openTelemetry = +// AutoConfiguredOpenTelemetrySdk.getOpenTelemetrySdk(). + +// Create a module that hosts tracing infrastructures. Should document that +// the module implementation may change as OpenTelemetry evolves. +OpenTelemetryModule otModule = OpenTelemetryModule.getInstance(openTelemetry); + +GlobalInterceptors.setInterceptors( + Arrays.asList(otModule.getClientTracingInterceptor()), + Arrays.asList(otModule.getServerTracerFactory())); + +``` + +#### C++ +The following new methods will be added in `OpenTelemetryPluginBuilder`. + +```C++ +class OpenTelemetryPluginBuilder { + public: + // If `SetTracerProvider()` is not called, no traces are collected. + OpenTelemetryPluginBuilder& SetTracerProvider( + std::shared_ptr tracer_provider); + // Set one or multiple text map propagators for span context propagation, e.g. + // GrpcTraceBinTextMapPropagator or community standard ones like W3C, etc. + OpenTelemetryPluginBuilder& SetTextMapPropagator( + std::unique_ptr + text_map_propagator); +}; +``` + +#### Go +In Go, the OpenTelemetry stream tracers and interceptors will be provided for users to install. + +TODO: add Go API. + +### Tracing Information +With the new OpenTelemetry plugin we will produce the same tracing information as we +produce for Census. The following tracing information during an RPC lifecycle +should be captured: At the client, on parent span: -* When the call is started, annotate name resolution completed if the RPC had -name resolution delay. +* When the call is started, annotate name resolution completed if the RPC had + name resolution delay. * When the call is closed, end the parent span with RPC status. On attempt span: -* When span is created, add attribute `previous-rpc-attempts` that captures the -number of preceding attempts for the RPC, and attribute `transparent-retry` that -shows whether stream is a transparent retry. -* When the stream is created on transport, annotate delayed load balancer pick -complete, if any. +* When span is created, add attribute `previous-rpc-attempts` that captures the + number of preceding attempts for the RPC, and attribute `transparent-retry` that + shows whether stream is a transparent retry. +* When the stream is created on transport, annotate delayed load balancer pick + complete, if any. * When an outbound message has been sent, add message events to capture seq no., -type(SENT), uncompressed message size, and compressed message size if any -compression. The seq no. is a sequence of integer numbers starting from 0 -to identify sent messages within the stream. The size is the total attempt message -bytes without encryption, not including grpc or transport framing bytes. -* When an inbound message has been received from the transport, add message -events to capture seq no., type(Received), wire message size, and uncompressed -message size if any decompression. The seq no. is a sequence of integer numbers -starting from 0 to identify received messages within the stream. + type(SENT), uncompressed message size, and compressed message size if any + compression. The seq no. is a sequence of integer numbers starting from 0 + to identify sent messages within the stream. The size is the total attempt message + bytes without encryption, not including grpc or transport framing bytes. +* When an inbound message has been received from the transport, add message + events to capture seq no., type(Received), wire message size, and uncompressed + message size if any decompression. The seq no. is a sequence of integer numbers + starting from 0 to identify received messages within the stream. * When the stream is closed, end the attempt span with RPC status. At the server: * When an outbound message has been sent, add message events to capture seq no., -type(SENT) and uncompressed message size, and compressed message size if any compression. + type(SENT) and uncompressed message size, and compressed message size if any compression. * When an inbound message has been read from the transport, add message events -to capture seq no., type(Received), wire message size, and uncompressed message size, -if any decompression. + to capture seq no., type(Received), wire message size, and uncompressed message size, + if any decompression. * When the stream is closed, end the span with RPC status. -Note that C++ is missing the seq no. information. And Java has issue of reporting -decompressed message size upon receiving messages, +Note that C++ is missing the seq no. information. And Java has issue of reporting +decompressed message size upon receiving messages, as a workaround, on the client parent span and server span: * When the uncompressed size of some outbound data is revealed, annotate seq no., -type(Received) and uncompressed message size. - -### gRPC Census API -The APIs to enable gRPC tracing are different between languages, e.g. in -grpc-java it is zero-configuration: as long as the grpc-census dependency exists in -the classpath, the traces are automatically generated. In C++, there is an API -for users to call to enable tracing. In Go, it is exposed via stream tracers. - -### Related Proposals and Documents: -* [gRFC L29: C++ API Changes for OpenCensus Integration][L29] -* [gRFC A45: Exposing OpenCensus Metrics and Tracing for gRPC retry][A45] -* [Microservices observability overview][grpc-observability-public-doc] -* [gRFC A66: OpenTelemetry Metrics][A66] - -## Proposal -gRPC users depending on the grpc-census plugin have non-trivial migration paths -to OpenTelemetry. Consider the following use cases: -1. Migrate an application binary where both OpenCensus and OpenTelemetry maybe exist -in the dependency tree. This can be the application’s own tracing code, or gRPC -OpenCensus, or other dependencies that involve OpenCensus and/or OpenTelemetry. -2. Compatibility between a gRPC client and server as two distributed components, -where during migration one will use OpenCensus and the other will use OpenTelemetry. - -All related APIs here are experimental until OpenTelemetry metrics [gRPC A66][A66] -is design and implementation complete. - -### Tracing Function In OpenTelemetry Plugin -We will add tracing functions in grpc-open-telemetry plugin, along with OpenTelemetry -metrics [gRPC A66][A66]. -Languages will keep using the same gRPC infrastructures, e.g. interceptor and -stream tracer to implement the feature. -We keep the grpc-census plugin to allow users who already depend on grpc-census to -continue using it for newer grpc versions. -In the new tracing function we will produce the same tracing information as we produce -for Census. But due to API differences between OpenCensus and OpenTelemetry, the -trace information is represented slightly differently. -In the new tracing function, the client will add `Event`s (name: -`Outbound message sent` and `Inbound message read`) with corresponding attributes, -mapped from OpenCensus `MessageEvent` fields: - -| OpenCensus Trace Message Event Fields | OpenTelemetry Trace Event Attribute Key | -|---------------------------------------|-----------------------------------------| -| Type | `message.event.type` | -| Message Id | `message.message.id` | -| Uncompressed message size | `message.event.size.uncompressed` | -| Compressed message size | `message.event.size.compressed` | - -OpenCensus span annotation description maps to OpenTelemetry event name, and -annotation attributes keys are mapped to event attributes keys: - -| OpenCensus Trace Annotation Attribute Key | OpenTelemetry Trace Event Attribute Key | -|-------------------------------------------|-----------------------------------------| -| `type` | `message.event.type` | -| `id` | `message.message.id` | - -And OpenTelemetry no longer has span end options as OpenCensus does. + type(Received) and uncompressed message size. ### Propagator Wire Format While gRPC OpenCensus directly interacts with the metadata API, gRPC OpenTelemetry @@ -137,6 +166,9 @@ following benefits: 2. Make it possible to plugin other propagators that the community supports. 3. Flexible API that allows clean and simple migration paths to a different propagator. +This will allow gRPC to keep using `grpc-trace-bin` header for context +propagation and also support other propagators. + As of today, OpenTelemetry propagator API only supports `TextMapPropagator`, that is to send string key/value pairs between the client and server, which is different from the binary header that gRPC currently uses. The future roadmap @@ -149,10 +181,11 @@ Therefore, gRPC will only support propagating `grpc-trace-bin` in TextMap propag gRPC will expose a custom `GrpcTraceBinPropagator` that implements `TextMapPropagator`. This grpc-provided propagator still uses the `grpc-trace-bin` header for context -propagation. When using `grpc-trace-bin` the OpenCensus spanContext and -OpenTelemetry spanContext are identical, therefore a gRPC OpenCensus client can -speak with a gRPC OpenTelemetry server and vice versa. Users can provide a -single composite propagator that combines one or multiple `TextMapPropagator` +propagation. A `grpc-trace-bin` formatter implementation for OpenTelemetry is +needed in each language, which can be similar to the OpenCensus implementation. +Go already has community support for that. + +Users can provide a single composite propagator that combines one or multiple `TextMapPropagator` for their client and server separately. This way, users can define their own migration path for context propagators in distributed components, see detailed discussion in the later session. Configuring gRPC OpenTelemetry with this @@ -397,84 +430,60 @@ MakeGrpcTraceBinTextMapPropagator() { ``` -### gRPC OpenTelemetry Tracing API -This section talks about enabling and configuring OpenTelemetry tracing. -The OpenTelemetry API will coexist with the OpenCensus API. Only one -`grpc-trace-bin` header will be sent for a single RPC as long as only one of -OpenTelemetry or OpenCensus is enabled for the channel. - -The APIs are different among languages due to different underlying infrastructures. - -#### Java -In Java, it will be part of global interceptors, so that the interceptors are -managed in a more sustainable way and user-friendly. As a prerequisite, the stream -tracer factory API will be stabilized. OpenTelemetryModule will be created with -an OpenTelemetryAPI instance passing in for necessary configurations. -Users can also rely on SDK autoconfig extension that configures the sdk object -through environmental variables or Java system properties, then obtain the sdk -object passed in to gRPC. - -```Java -// Construct OpenTelemetry to be passed to gRPC OpenTelemetry module for -// trace and metrics configurations. -SdkTracerProvider sdkTracerProvider = SdkTracerProvider - .builder() - .addSpanProcessor( - BatchSpanProcessor.builder(exporter).build()) - .build(); -OpenTelemetry openTelemetry = OpenTelemetrySdk.builder() - .setTracerProvider(sdkTracerProvider) - .setMeterProvider(...) - .setPropagators( - ContextPropagators.create(GrpcTraceBinPropagator.getInstance())) - .build(); - -// Alternatively, use auto configuration: -// OpenTelemetry openTelemetry = -// AutoConfiguredOpenTelemetrySdk.getOpenTelemetrySdk(). +## Migrate from OpenCensus to OpenTelemetry +### Tracing Information +gRPC is generating similar tracing information for OpenTelemetry compared with OpenCensus, +but due to API differences between those two libraries, the +trace information is represented slightly differently. +In the new OpenTelemetry plugin, the client will add `Event`s (name: +`Outbound message sent` and `Inbound message read`) with corresponding attributes, +mapped from OpenCensus `MessageEvent` fields: -// Create a module that hosts tracing infrastructures. Should document that -// the module implementation may change as OpenTelemetry evolves. -OpenTelemetryModule otModule = OpenTelemetryModule.getInstance(openTelemetry); +| OpenCensus Trace Message Event Fields | OpenTelemetry Trace Event Attribute Key | +|---------------------------------------|-----------------------------------------| +| Type | `message.event.type` | +| Message Id | `message.message.id` | +| Uncompressed message size | `message.event.size.uncompressed` | +| Compressed message size | `message.event.size.compressed` | -GlobalInterceptors.setInterceptors( - Arrays.asList(otModule.getClientTracingInterceptor()), - Arrays.asList(otModule.getServerTracerFactory())); +OpenCensus span annotation description maps to OpenTelemetry event name, and +annotation attributes keys are mapped to event attributes keys: -``` +| OpenCensus Trace Annotation Attribute Key | OpenTelemetry Trace Event Attribute Key | +|-------------------------------------------|-----------------------------------------| +| `type` | `message.event.type` | +| `id` | `message.message.id` | -#### C++ -The following new methods will be added in `OpenTelemetryPluginBuilder`. +And OpenTelemetry no longer has span end options as OpenCensus does. -```C++ -class OpenTelemetryPluginBuilder { - public: - // If `SetTracerProvider()` is not called, no traces are collected. - OpenTelemetryPluginBuilder& SetTracerProvider( - std::shared_ptr tracer_provider); - // Set one or multiple text map propagators for span context propagation, e.g. - // GrpcTraceBinTextMapPropagator or community standard ones like W3C, etc. - OpenTelemetryPluginBuilder& SetTextMapPropagator( - std::unique_ptr - text_map_propagator); -}; -``` +### gRPC Census API +The gRPC OpenCensus tracing APIs in grpc-census plugin are different between +languages, e.g. in grpc-java it is zero-configuration: as long as the grpc-census +dependency exists in the classpath, the traces are automatically generated. +In C++, there is an API for users to call to enable tracing. In Go, it is exposed via stream tracers. +We keep the grpc-census plugin to allow users who already depend on grpc-census to +continue using it for newer grpc versions. -#### Go -In Go, the OpenTelemetry stream tracers and interceptors will be provided for users to install. +gRPC users depending on the grpc-census plugin have non-trivial migration paths +to OpenTelemetry. Consider the following use cases: +1. Compatibility between a gRPC client and server as two distributed components, +where during migration one will use OpenCensus and the other will use OpenTelemetry. +2. Migrate an application binary where both OpenCensus and OpenTelemetry maybe exist + in the dependency tree. This can be the application’s own tracing code, or gRPC + OpenCensus, or other dependencies that involve OpenCensus and/or OpenTelemetry. -TODO: add Go API. +Here are the suggested solutions for both use cases. ### Migrate to OpenTelemetry: Cross-process Networking Concerns When users first introduce gRPC OpenTelemetry, for the time window when the gRPC client and server have mixed plugins of OpenTelemetry and OpenCensus, -it is encouraged to use `GrpcTraceBinPropagator` that propagates -`grpc-trace-bin` header for the migration. Using the same header greatly simplifies rollout. - -A `grpc-trace-bin` formatter implementation for OpenTelemetry is needed in each language, -which can be similar to the OpenCensus implementation. Go already has community -support for that. +it is encouraged to use `GrpcTraceBinPropagator`, as described in the previous section, +that propagates `grpc-trace-bin` header for the migration. +When using `grpc-trace-bin` the OpenCensus spanContext and +OpenTelemetry spanContext are identical, therefore a gRPC OpenCensus client can +speak with a gRPC OpenTelemetry server and vice versa. Using the same header +greatly simplifies rollout. After migration period, users have the flexibility to switch to other propagators. OpenTelemetry and its extension packages support multiple text map propagators. From 942b9b6262c529b4133255294d0d0d316f15edde Mon Sep 17 00:00:00 2001 From: yifeizhuang Date: Mon, 18 Mar 2024 17:52:26 -0700 Subject: [PATCH 13/30] separate c++ comment --- A72-open-telemetry-tracing.md | 18 ++++++++---------- 1 file changed, 8 insertions(+), 10 deletions(-) diff --git a/A72-open-telemetry-tracing.md b/A72-open-telemetry-tracing.md index ed6d83d12..6dc9554e8 100644 --- a/A72-open-telemetry-tracing.md +++ b/A72-open-telemetry-tracing.md @@ -208,10 +208,8 @@ public class GrpcTraceBinPropagator implements TextMapPropagator { // the overloaded set(Metadata, String, byte[]) method added by gRPC. ((GrpcCommonSetter) setter).set((Metadata) carrier, "grpc-trace-bin", value); } else { - // Slow path for C++. gRPC C++ does not have runtime type inspection, so we - // encode bytes to String to comply with the TextMapSetter API. This code - // path is also used in the situation where GrpcTraceBinPropagator - // is used with a TextMapSetter externally. + // Slow path. For the situation where GrpcTraceBinPropagator is used with + // a TextMapSetter externally. setter.set(carrier, "grpc-trace-bin", Base64.getEncoder().encodeToString(value)); } } @@ -222,9 +220,7 @@ public class GrpcTraceBinPropagator implements TextMapPropagator { if (textMapGetter instanceof GrpcCommonGetter) { //Fast path for Java/Go bytes = ((GrpcCommonGetter) textMapGetter).getBinary((Metadata) c, "grpc-trace-bin"); } else { - // Slow path for C++. gRPC C++ does not have runtime type inspection, so - // we decode String from TextMapGetter API to bytes. This code path applies - // to the situation where GrpcTraceBinPropagator is used with a TextMapGetter + // Slow path. For the situation where GrpcTraceBinPropagator is used with a TextMapGetter // externally. String contextString = textMapGetter.get(c, "grpc-trace-bin"); bytes = Base64.getDecoder().decode(contextString); @@ -258,8 +254,8 @@ class GrpcCommonSetter implements TextMapSetter, GrpcBinarySetter { @Override public String get(@Nullable Metadata carrier, String key) { if (key.equals("grpc-trace-bin")) { - // Slow path for C++: return string encoded from bytes. Later we decode to + // Slow path: return string encoded from bytes. Later we decode to // bytes in GrpcTraceBinPropagator.extract(). byte[] value = carrier.get(Metadata.Key.of(key, BINARY_BYTE_MARSHALLER)); return Base64.getEncoder().encodeToString(value); @@ -344,6 +340,8 @@ class GrpcTraceBinTextMapPropagator } carrier.Set( "grpc-trace-bin", + // gRPC C++ does not have runtime type inspection, so we + // encode bytes to String to comply with the TextMapSetter API. absl::Base64Escape( absl::string_view(SpanContextToGrpcTraceBinHeader(span_context)) .data()), From eea05711bcf7b730c3987b2a0d74f9a83ad2a3fa Mon Sep 17 00:00:00 2001 From: yifeizhuang Date: Mon, 18 Mar 2024 18:13:42 -0700 Subject: [PATCH 14/30] minor change --- A72-open-telemetry-tracing.md | 67 +++++++++++++++++------------------ 1 file changed, 33 insertions(+), 34 deletions(-) diff --git a/A72-open-telemetry-tracing.md b/A72-open-telemetry-tracing.md index 6dc9554e8..e00202f73 100644 --- a/A72-open-telemetry-tracing.md +++ b/A72-open-telemetry-tracing.md @@ -87,7 +87,7 @@ OpenTelemetry openTelemetry = OpenTelemetrySdk.builder() // Create a module that hosts tracing infrastructures. Should document that // the module implementation may change as OpenTelemetry evolves. -OpenTelemetryModule otModule = OpenTelemetryModule.getInstance(openTelemetry); +OpenTelemetryModule otModule = OpenTelemetryModule.newBuilder().sdk(openTelemetry).build(); GlobalInterceptors.setInterceptors( Arrays.asList(otModule.getClientTracingInterceptor()), @@ -167,9 +167,10 @@ following benefits: 3. Flexible API that allows clean and simple migration paths to a different propagator. This will allow gRPC to keep using `grpc-trace-bin` header for context -propagation and also support other propagators. +propagation and also support other propagators. gRPC will expose a custom +`GrpcTraceBinPropagator` that implements `TextMapPropagator`. However, there is a caveat: -As of today, OpenTelemetry propagator API only supports `TextMapPropagator`, +Currently, OpenTelemetry propagator API only supports `TextMapPropagator`, that is to send string key/value pairs between the client and server, which is different from the binary header that gRPC currently uses. The future roadmap to support binary propagators at OpenTelemetry is unclear. So, gRPC will use @@ -178,10 +179,7 @@ around the lack of binary propagator API to support `grpc-trace-bin`. In fact, TextMap propagator does not show visible performance impact for C++, which is the most sensitive language to performance, based on internal micro benchmarking. Therefore, gRPC will only support propagating `grpc-trace-bin` in TextMap propagator. - -gRPC will expose a custom `GrpcTraceBinPropagator` that implements `TextMapPropagator`. -This grpc-provided propagator still uses the `grpc-trace-bin` header for context -propagation. A `grpc-trace-bin` formatter implementation for OpenTelemetry is +A `grpc-trace-bin` formatter implementation for OpenTelemetry is needed in each language, which can be similar to the OpenCensus implementation. Go already has community support for that. @@ -318,7 +316,7 @@ example, providing an explicit knob on `GrpcTraceBinTextMapPropagator` that assumes that this propagator is being used with gRPC and can hence skirt `TextMapPropagator` compatibility requirements. ```C++ -std::unique_ptr +std::unique_ptr MakeGrpcTraceBinTextMapPropagator(); ``` @@ -419,7 +417,7 @@ class GrpcTextMapCarrier : public opentelemetry::context::TextMapCarrier { } // namespace internal -std::unique_ptr +std::unique_ptr MakeGrpcTraceBinTextMapPropagator() { return std::make_unique(); } @@ -430,33 +428,9 @@ MakeGrpcTraceBinTextMapPropagator() { ## Migrate from OpenCensus to OpenTelemetry -### Tracing Information -gRPC is generating similar tracing information for OpenTelemetry compared with OpenCensus, -but due to API differences between those two libraries, the -trace information is represented slightly differently. -In the new OpenTelemetry plugin, the client will add `Event`s (name: -`Outbound message sent` and `Inbound message read`) with corresponding attributes, -mapped from OpenCensus `MessageEvent` fields: - -| OpenCensus Trace Message Event Fields | OpenTelemetry Trace Event Attribute Key | -|---------------------------------------|-----------------------------------------| -| Type | `message.event.type` | -| Message Id | `message.message.id` | -| Uncompressed message size | `message.event.size.uncompressed` | -| Compressed message size | `message.event.size.compressed` | - -OpenCensus span annotation description maps to OpenTelemetry event name, and -annotation attributes keys are mapped to event attributes keys: - -| OpenCensus Trace Annotation Attribute Key | OpenTelemetry Trace Event Attribute Key | -|-------------------------------------------|-----------------------------------------| -| `type` | `message.event.type` | -| `id` | `message.message.id` | - -And OpenTelemetry no longer has span end options as OpenCensus does. ### gRPC Census API -The gRPC OpenCensus tracing APIs in grpc-census plugin are different between +The existing gRPC OpenCensus tracing APIs in grpc-census plugin are different between languages, e.g. in grpc-java it is zero-configuration: as long as the grpc-census dependency exists in the classpath, the traces are automatically generated. In C++, there is an API for users to call to enable tracing. In Go, it is exposed via stream tracers. @@ -532,6 +506,31 @@ Finally, they switch to grpc-open-telemetry and finish the migration. |-- gRPC -> Using Otel to generate Trace A ----------------------------- | |-- Application -> Using Otel to generate a sub Trace B------------------ | ``` +### Tracing Information +gRPC is generating similar tracing information for OpenTelemetry compared with OpenCensus, +but due to API differences between those two libraries, the +trace information is represented slightly differently. +In the new OpenTelemetry plugin, the client will add `Event`s (name: +`Outbound message sent` and `Inbound message read`) with corresponding attributes, +mapped from OpenCensus `MessageEvent` fields: + +| OpenCensus Trace Message Event Fields | OpenTelemetry Trace Event Attribute Key | +|---------------------------------------|-----------------------------------------| +| Type | `message.event.type` | +| Message Id | `message.message.id` | +| Uncompressed message size | `message.event.size.uncompressed` | +| Compressed message size | `message.event.size.compressed` | + +OpenCensus span annotation description maps to OpenTelemetry event name, and +annotation attributes keys are mapped to event attributes keys: + +| OpenCensus Trace Annotation Attribute Key | OpenTelemetry Trace Event Attribute Key | +|-------------------------------------------|-----------------------------------------| +| `type` | `message.event.type` | +| `id` | `message.message.id` | + +And OpenTelemetry no longer has span end options as OpenCensus does. + ## Rationale C++ will not have the optimization path in its `GrpcTraceBinPropagator` API. We From a2c420ec79decd881ec44b667274f32edd17891f Mon Sep 17 00:00:00 2001 From: yifeizhuang Date: Tue, 19 Mar 2024 16:08:21 -0700 Subject: [PATCH 15/30] rephrase --- A72-open-telemetry-tracing.md | 34 ++++++++++++++++------------------ 1 file changed, 16 insertions(+), 18 deletions(-) diff --git a/A72-open-telemetry-tracing.md b/A72-open-telemetry-tracing.md index e00202f73..a6d7a0daf 100644 --- a/A72-open-telemetry-tracing.md +++ b/A72-open-telemetry-tracing.md @@ -47,7 +47,7 @@ tracing instrumentation. ## Proposal ### gRPC OpenTelemetry Tracing API We will add tracing functions in grpc-open-telemetry plugin, along with OpenTelemetry -metrics [gRPC A66][A66]. Languages will keep using gRPC infrastructures, e.g. interceptor and +metrics [gRFC A66][A66]. Languages will keep using gRPC infrastructures, e.g. interceptor and stream tracer to implement the feature, the same as Census. The OpenTelemetry API will coexist with the OpenCensus API. Only one @@ -60,10 +60,10 @@ underlying infrastructures. In Java, it will be part of global interceptors, so that the interceptors are managed in a more sustainable way and user-friendly. As a prerequisite, the stream tracer factory API will be stabilized. OpenTelemetryModule will be created with -an OpenTelemetryAPI instance passing in for necessary configurations. +an OpenTelemetry API instance passing in for necessary configurations. Users can also rely on SDK autoconfig extension that configures the sdk object -through environmental variables or Java system properties, then obtain the sdk -object passed in to gRPC. +through environmental variables or Java system properties, then pass the +obtained sdk object to gRPC. ```Java // Construct OpenTelemetry to be passed to gRPC OpenTelemetry module for @@ -125,7 +125,7 @@ should be captured: At the client, on parent span: * When the call is started, annotate name resolution completed if the RPC had name resolution delay. -* When the call is closed, end the parent span with RPC status. +* When the call is closed, set RPC status and end the parent span. On attempt span: * When span is created, add attribute `previous-rpc-attempts` that captures the @@ -142,7 +142,7 @@ On attempt span: events to capture seq no., type(Received), wire message size, and uncompressed message size if any decompression. The seq no. is a sequence of integer numbers starting from 0 to identify received messages within the stream. -* When the stream is closed, end the attempt span with RPC status. +* When the stream is closed, set RPC status and end the attempt span. At the server: * When an outbound message has been sent, add message events to capture seq no., @@ -150,7 +150,7 @@ At the server: * When an inbound message has been read from the transport, add message events to capture seq no., type(Received), wire message size, and uncompressed message size, if any decompression. -* When the stream is closed, end the span with RPC status. +* When the stream is closed, set the RPC status and end the span. Note that C++ is missing the seq no. information. And Java has issue of reporting decompressed message size upon receiving messages, @@ -308,12 +308,13 @@ both binary headers and TextMap propagators. #### Context Propagation APIs in C++ C++ will also support propagator APIs to provides API uniformity among -languages. Since gRPC C++ avoids RTTI, it can not use the same optimization path -as Java/Go. This will result in an extra base64 encoding/decoding step to -satisfy `TextMapPropagator` requirement that the key/value pair be a valid HTTP -field. There are possible optimizations C++ might pursue in the future, for -example, providing an explicit knob on `GrpcTraceBinTextMapPropagator` that -assumes that this propagator is being used with gRPC and can hence skirt `TextMapPropagator` compatibility requirements. +languages. Since gRPC C++ avoids Run-time type information (RTTI), it can not +use the same optimization path as Java/Go. This will result in an extra base64 +encoding/decoding step to satisfy `TextMapPropagator` requirement that the +key/value pair be a valid HTTP field. There are possible optimizations C++ might +pursue in the future, for example, providing an explicit knob on +`GrpcTraceBinTextMapPropagator` that assumes that this propagator is being used +with gRPC and can hence skirt `TextMapPropagator` compatibility requirements. ```C++ std::unique_ptr @@ -338,8 +339,7 @@ class GrpcTraceBinTextMapPropagator } carrier.Set( "grpc-trace-bin", - // gRPC C++ does not have runtime type inspection, so we - // encode bytes to String to comply with the TextMapSetter API. + // gRPC C++ does not have RTTI, so we encode bytes to String to comply with the TextMapSetter API. absl::Base64Escape( absl::string_view(SpanContextToGrpcTraceBinHeader(span_context)) .data()), @@ -474,7 +474,7 @@ The OpenCensus [shim](https://github.com/open-telemetry/opentelemetry-java/tree/ (currently available in Java, Go, Python) allows binaries that have a mix of OpenTelemetry and OpenCensus dependencies to export trace spans from both frameworks, and keep the correct parent-child relationship. This is the -recommended approach to migrate to OpenTelemetry in one binary gradually. +recommended approach to migrate to OpenTelemetry incrementally within a single binary. Note that the in-binary migration and cross-process migration can be done in parallel. The shim package that bridges two libraries works as follows, considering the @@ -529,8 +529,6 @@ annotation attributes keys are mapped to event attributes keys: | `type` | `message.event.type` | | `id` | `message.message.id` | -And OpenTelemetry no longer has span end options as OpenCensus does. - ## Rationale C++ will not have the optimization path in its `GrpcTraceBinPropagator` API. We From 50565d6a96e47d520dd4086cc875aaba40d33bf6 Mon Sep 17 00:00:00 2001 From: yifeizhuang Date: Wed, 20 Mar 2024 14:34:23 -0700 Subject: [PATCH 16/30] add go API. Minor fix comment and reference --- A72-open-telemetry-tracing.md | 47 ++++++++++++++++++++++++++++++----- 1 file changed, 41 insertions(+), 6 deletions(-) diff --git a/A72-open-telemetry-tracing.md b/A72-open-telemetry-tracing.md index a6d7a0daf..4b5fce68c 100644 --- a/A72-open-telemetry-tracing.md +++ b/A72-open-telemetry-tracing.md @@ -41,7 +41,6 @@ tracing instrumentation. ### Related Proposals and Documents: * [gRFC L29: C++ API Changes for OpenCensus Integration][L29] * [gRFC A45: Exposing OpenCensus Metrics and Tracing for gRPC retry][A45] -* [Microservices observability overview][grpc-observability-public-doc] * [gRFC A66: OpenTelemetry Metrics][A66] ## Proposal @@ -113,9 +112,47 @@ class OpenTelemetryPluginBuilder { ``` #### Go -In Go, the OpenTelemetry stream tracers and interceptors will be provided for users to install. -TODO: add Go API. +```go +import ( + "go.opentelemetry.io/otel/trace" +) + +// TraceOptions are the trace options for OpenTelemetry instrumentation. +type TraceOptions struct { + TraceProvider trace.TraceProvider +} + +// DialOption returns a dial option which enables OpenCensus instrumentation +// code for a grpc.ClientConn. +// +// Client applications interested in instrumenting their grpc.ClientConn should +// pass the dial option returned from this function as a dial option to +// grpc.Dial(). +// +// Using this option will always lead to instrumentation, however in order to +// use the data a SpanExporter must be registered with the TraceProvider in the +// TraceOption. Client side has retries, so a Unary and Streaming Interceptor are +// registered to handle per RPC traces, and a Stats Handler is registered to handle +// per RPC attempt trace. These three components registered work together in +// conjunction, and do not work standalone. It is not supported to use this +// alongside another stats handler dial option. +func DialOption(to TraceOptions) grpc.DialOption {} + +// ServerOption returns a server option which enables OpenTelemetry +// instrumentation code for a grpc.Server. +// +// Server applications interested in instrumenting their grpc.Server should pass +// the server option returned from this function as an argument to +// grpc.NewServer(). +// +// Using this option will always lead to instrumentation, however in order to +// use the data a SpanExporter must be registered with the TraceProvider option. +// Server side does not have retries, so a registered Stats Handler is the only +// option that is returned. It is not supported to use this alongside another +// stats handler server option. +func ServerOption(to TraceOptions) grpc.ServerOption {} +``` ### Tracing Information With the new OpenTelemetry plugin we will produce the same tracing information as we @@ -253,8 +290,7 @@ class GrpcCommonSetter implements TextMapSetter, GrpcBinarySetter Date: Wed, 20 Mar 2024 18:07:02 -0700 Subject: [PATCH 17/30] re-structure --- A72-open-telemetry-tracing.md | 329 ++++++++++++++++++++-------------- 1 file changed, 190 insertions(+), 139 deletions(-) diff --git a/A72-open-telemetry-tracing.md b/A72-open-telemetry-tracing.md index 4b5fce68c..e66551011 100644 --- a/A72-open-telemetry-tracing.md +++ b/A72-open-telemetry-tracing.md @@ -9,7 +9,7 @@ A72: OpenTelemetry Tracing * Last updated: 2024-01 * Discussion at: https://groups.google.com/g/grpc-io/c/e_ByaRmtJak -## Abstract +# Abstract This proposal adds support for OpenTelemetry tracing and suggests migration paths away from OpenCensus tracing. Discussions include: * The API surface to enable and configure OpenTelemetry tracing. @@ -21,7 +21,7 @@ paths away from OpenCensus tracing. Discussions include: Note that stats and logging are out of scope. -## Background +# Background This work aligns with the community consensus to switch to OpenTelemetry as the next generation OpenCensus. The latter is no longer maintained after July 31, 2023. @@ -43,19 +43,32 @@ tracing instrumentation. * [gRFC A45: Exposing OpenCensus Metrics and Tracing for gRPC retry][A45] * [gRFC A66: OpenTelemetry Metrics][A66] -## Proposal -### gRPC OpenTelemetry Tracing API +# Proposal +## gRPC OpenTelemetry Tracing API We will add tracing functions in grpc-open-telemetry plugin, along with OpenTelemetry -metrics [gRFC A66][A66]. Languages will keep using gRPC infrastructures, e.g. interceptor and -stream tracer to implement the feature, the same as Census. +metrics [gRFC A66][A66]. Internally, the tracing functionality will be implemented +using existing gRPC infrastructure such as interceptors and stream tracers. -The OpenTelemetry API will coexist with the OpenCensus API. Only one -`grpc-trace-bin` header will be sent for a single RPC as long as only one of -OpenTelemetry or OpenCensus is enabled for the channel. The APIs to enable and -configure OpenTelemetry tracing are different among languages due to different -underlying infrastructures. +#### Propagator Wire Format +gRPC OpenTelemetry will use the existing OpenTelemetry propagators API for context propagation +by encoding them in metadata, for the following benefits: +1. Full integration with OpenTelemetry APIs that is easier for users to reason about. +2. Make it possible to plugin other propagators that the community supports. +3. Flexible API that allows clean and simple migration paths to a different propagator. + +In order for the propagator to perform injecting and extracting spanContext value +from the carrier, which is the Metadata in gRPC, languages will +implement Getter and Setter corresponding to the propagator. +Currently, OpenTelemetry propagator API only supports `TextMapPropagator`, +that is to send string key/value pairs between the client and server. +Therefore, to implement Getter and Setter is to implement the TextMap carrier interface: +`TextMapCarrier` (For C++/Go), or `TextMapGetter`/`TextMapSetter` (For Java), see +pseudocode in later sections. -#### Java + The APIs to enable and configure OpenTelemetry tracing are different among + languages due to different underlying infrastructures. + +### Java In Java, it will be part of global interceptors, so that the interceptors are managed in a more sustainable way and user-friendly. As a prerequisite, the stream tracer factory API will be stabilized. OpenTelemetryModule will be created with @@ -64,37 +77,53 @@ Users can also rely on SDK autoconfig extension that configures the sdk object through environmental variables or Java system properties, then pass the obtained sdk object to gRPC. +The following methods will be added to OpenTelemetryModule. + ```Java -// Construct OpenTelemetry to be passed to gRPC OpenTelemetry module for -// trace and metrics configurations. -SdkTracerProvider sdkTracerProvider = SdkTracerProvider - .builder() - .addSpanProcessor( - BatchSpanProcessor.builder(exporter).build()) - .build(); - -OpenTelemetry openTelemetry = OpenTelemetrySdk.builder() - .setTracerProvider(sdkTracerProvider) - .setMeterProvider(...) - .setPropagators( - ContextPropagators.create(GrpcTraceBinPropagator.getInstance())) - .build(); - -// Alternatively, use auto configuration: -// OpenTelemetry openTelemetry = -// AutoConfiguredOpenTelemetrySdk.getOpenTelemetrySdk(). - -// Create a module that hosts tracing infrastructures. Should document that -// the module implementation may change as OpenTelemetry evolves. -OpenTelemetryModule otModule = OpenTelemetryModule.newBuilder().sdk(openTelemetry).build(); - -GlobalInterceptors.setInterceptors( - Arrays.asList(otModule.getClientTracingInterceptor()), - Arrays.asList(otModule.getServerTracerFactory())); +// A module that hosts OpenTelemetry tracing/metrics infrastructures. Module +// implementation may change as OpenTelemetry evolves. +public class OpenTelemetryModule { + /** + * OpenTelemetry instance is used to configure metrics and traces settings. + * Build a module by passing in an OpenTelemetry instance. Then get interceptor + * and ServerStreamTracerFactory from the module and install to the GlobalInterceptors. + * + * Sample: + * Construct a TraceProvider that will be used to provide traces during instrumentation. + * + * SdkTracerProvider sdkTracerProvider = SdkTracerProvider.builder() + * .addSpanProcessor( + * BatchSpanProcessor.builder(exporter).build()) + * .build(); + * + * Construct OpenTelemetry to be passed to gRPC OpenTelemetry module for + * traces and metrics configurations. + * + * OpenTelemetry openTelemetry = OpenTelemetrySdk.builder() + * .setTracerProvider(sdkTracerProvider) + * .setMeterProvider(...) + * .setPropagators(ContextPropagators.create(W3CTraceContextPropagator.getInstance())) + * .build(); + * OpenTelemetryModule otModule = OpenTelemetryModule.newBuilder().sdk(openTelemetry).build(); + * + * + * Add interceptors and StreamTracerFactory obtained from the module to GlobalInterceptors. + * + * GlobalInterceptors.setInterceptors( + * Arrays.asList(otModule.getClientTracingInterceptor()), + * Arrays.asList(otModule.getServerTracerFactory())); + */ + + // Return the client interceptor to be installed by users to generate trace information. + public ClientInterceptor getClientTracingInterceptor(); + + // Return the ServerStreamTracerFactory to be installed by users to generate trace information. + public ServerStreamTracerFactory getServerTracerFactory(); +} ``` -#### C++ +### C++ The following new methods will be added in `OpenTelemetryPluginBuilder`. ```C++ @@ -104,15 +133,15 @@ class OpenTelemetryPluginBuilder { OpenTelemetryPluginBuilder& SetTracerProvider( std::shared_ptr tracer_provider); // Set one or multiple text map propagators for span context propagation, e.g. - // GrpcTraceBinTextMapPropagator or community standard ones like W3C, etc. + // the community standard ones like W3C, etc. OpenTelemetryPluginBuilder& SetTextMapPropagator( std::unique_ptr text_map_propagator); }; ``` -#### Go - +### Go +The following methods will be added to the opentelemetry package. ```go import ( "go.opentelemetry.io/otel/trace" @@ -154,82 +183,136 @@ func DialOption(to TraceOptions) grpc.DialOption {} func ServerOption(to TraceOptions) grpc.ServerOption {} ``` -### Tracing Information -With the new OpenTelemetry plugin we will produce the same tracing information as we -produce for Census. The following tracing information during an RPC lifecycle -should be captured: +## Tracing Information +RPCs on the client side may undergo retry attempts, whereas on the server side, +they do not. gRPC records both per-call tracing details (at the parent span) +and per-attempt tracing details (on the attempt span) on the client side. +On the server side, there is only per-call traces. With the new OpenTelemetry +plugin we will produce the following tracing information during an RPC lifecycle: At the client, on parent span: -* When the call is started, annotate name resolution completed if the RPC had - name resolution delay. +* If the RPC experienced name resolution delay, add an Event at the start of the + call with the name "Delayed name resolution complete" upon completion of the + name resolution process. * When the call is closed, set RPC status and end the parent span. On attempt span: -* When span is created, add attribute `previous-rpc-attempts` that captures the - number of preceding attempts for the RPC, and attribute `transparent-retry` that - shows whether stream is a transparent retry. -* When the stream is created on transport, annotate delayed load balancer pick - complete, if any. -* When an outbound message has been sent, add message events to capture seq no., - type(SENT), uncompressed message size, and compressed message size if any - compression. The seq no. is a sequence of integer numbers starting from 0 - to identify sent messages within the stream. The size is the total attempt message - bytes without encryption, not including grpc or transport framing bytes. -* When an inbound message has been received from the transport, add message - events to capture seq no., type(Received), wire message size, and uncompressed - message size if any decompression. The seq no. is a sequence of integer numbers - starting from 0 to identify received messages within the stream. +* When span is created, set the attribute with key `previous-rpc-attempts` and an + integer value representing the count of previous attempts made for the RPC. +* When span is created, set the attribute with key `transparent-retry` and a + boolean value indicating whether the stream is undergoing a transparent retry. +* If the RPC experienced load balancer pick delay, add an Event with the name + "Delayed LB pick complete" upon creation of the stream on the transport. +* When the application sends an outbound message to the transport, add an Event + with the following attributes: + * key `message.event.type` with String value "SENT" + * key `message.message.id` with integer value seq no. The seq no. is a sequence + of integer numbers starting from 0 to identify sent messages within the stream + * key `message.event.size.uncompressed` with integer value of uncompressed + message size. The size is the total attempt message bytes without encryption, + not including grpc or transport framing bytes. + * If any compression, key `message.event.size.compressed` with integer value + of compressed message size. +* When an inbound message has been received from the transport, add an Event + with the following attributes: + * key `message.event.type` with String value "RECEIVED" + * key `message.message.id` with integer value seq no. + * key `message.event.size.uncompressed` with integer value of wire message. + * If any compression, key `message.event.size.compressed` with integer value + of compressed message size. * When the stream is closed, set RPC status and end the attempt span. At the server: -* When an outbound message has been sent, add message events to capture seq no., - type(SENT) and uncompressed message size, and compressed message size if any compression. -* When an inbound message has been read from the transport, add message events - to capture seq no., type(Received), wire message size, and uncompressed message size, - if any decompression. +* When the application sends an outbound message to the transport, add an Event + with the following attributes: + * key `message.event.type` with String value "SENT" + * key `message.message.id` with integer value seq no. The seq no. is a sequence + of integer numbers starting from 0 to identify sent messages within the stream + * key `message.event.size.uncompressed` with integer value of uncompressed + message size. The size is the total attempt message bytes without encryption, + not including grpc or transport framing bytes. + * If any compression, key `message.event.size.compressed` with integer value + of compressed message size. +* When an inbound message has been read from the transport, add an Event + with the following attributes: + * key `message.event.type` with String value "RECEIVED" + * key `message.message.id` with integer value seq no. + * key `message.event.size.compressed` with integer value of wire message. + * If any decompression, key `message.event.size.uncompressed` with integer value + of compressed message size. * When the stream is closed, set the RPC status and end the span. -Note that C++ is missing the seq no. information. And Java has issue of reporting -decompressed message size upon receiving messages, +### Limitations +Note that C++ is missing the seq no. information due to lack of transport support. +While it's not critical, we can include these information if users request it in the future. + +Java has an issue of reporting decompressed message size upon receiving messages, as a workaround, on the client parent span and server span: -* When the uncompressed size of some outbound data is revealed, annotate seq no., - type(Received) and uncompressed message size. +* When the uncompressed size of some outbound data is revealed, add an Event + with the following attributes: + * key `message.event.type` with String value "RECEIVED" + * key `message.message.id` with integer value seq no. + * key `message.event.size.uncompressed` with integer value + of uncompressed message size -### Propagator Wire Format -While gRPC OpenCensus directly interacts with the metadata API, gRPC OpenTelemetry -will use the standardized propagators API for context propagation, for the -following benefits: -1. Full integration with OpenTelemetry APIs that is easier for users to reason about. -2. Make it possible to plugin other propagators that the community supports. -3. Flexible API that allows clean and simple migration paths to a different propagator. +## Migrate from OpenCensus to OpenTelemetry -This will allow gRPC to keep using `grpc-trace-bin` header for context -propagation and also support other propagators. gRPC will expose a custom -`GrpcTraceBinPropagator` that implements `TextMapPropagator`. However, there is a caveat: +### gRPC OpenCensus API +The existing gRPC OpenCensus tracing APIs in grpc-census plugin are different between +languages, e.g. in grpc-java it is zero-configuration: as long as the grpc-census +dependency exists in the classpath, the traces are automatically generated. +In C++, there is an API for users to call to enable tracing. In Go, it is exposed +via stream tracers. The OpenTelemetry API will coexist with the OpenCensus API. +We keep the grpc-census plugin to allow users who already depend on grpc-census to +continue using it for newer grpc versions. -Currently, OpenTelemetry propagator API only supports `TextMapPropagator`, -that is to send string key/value pairs between the client and server, which is -different from the binary header that gRPC currently uses. The future roadmap -to support binary propagators at OpenTelemetry is unclear. So, gRPC will use -propagator API in TextMap format with an optimization path (Go and Java) to work +gRPC users depending on the grpc-census plugin have non-trivial migration paths +to OpenTelemetry. Consider the following use cases: +1. Compatibility between a gRPC client and server as two distributed components, +where during migration one will use OpenCensus and the other will use OpenTelemetry. +2. Migrate an application binary where both OpenCensus and OpenTelemetry maybe exist + in the dependency tree. This can be the application’s own tracing code, or gRPC + OpenCensus, or other dependencies that involve OpenCensus and/or OpenTelemetry. + +Here are the suggested solutions for both use cases. + +### Migrate to OpenTelemetry: Cross-process Networking Concerns +When users first introduce gRPC OpenTelemetry, for the time window when the +gRPC client and server have mixed plugins of OpenTelemetry and OpenCensus. +To tackle this, gRPC will expose a custom `GrpcTraceBinPropagator` that implements +`TextMapPropagator`. This will allow gRPC to keep using `grpc-trace-bin` header +for context propagation and also support other propagators. +When using `grpc-trace-bin` the OpenCensus spanContext and OpenTelemetry spanContext +are identical, therefore a gRPC OpenCensus client can speak with a gRPC OpenTelemetry +server and vice versa. It is encouraged to use `GrpcTraceBinPropagator` for the migration. +Using the same header greatly simplifies rollout. However, there is a caveat about +`GrpcTraceBinPropagator`: + +Currently, OpenTelemetry propagator API only supports `TextMapPropagator`, +that is to send string key/value pairs between the client and server, which is +different from the binary header that gRPC currently uses. The future roadmap +to support binary propagators at OpenTelemetry is unclear. So, gRPC will use +propagator API in TextMap format with an optimization path (Go and Java) to work around the lack of binary propagator API to support `grpc-trace-bin`. In fact, -TextMap propagator does not show visible performance impact for C++, which is -the most sensitive language to performance, based on internal micro benchmarking. +TextMap propagator does not show visible performance impact for C++, which is +the most sensitive language to performance, based on internal micro benchmarking. Therefore, gRPC will only support propagating `grpc-trace-bin` in TextMap propagator. -A `grpc-trace-bin` formatter implementation for OpenTelemetry is -needed in each language, which can be similar to the OpenCensus implementation. +Only one `grpc-trace-bin` header will be sent for a single RPC as long as only one of +OpenTelemetry or OpenCensus is enabled for the channel. +A `grpc-trace-bin` formatter implementation for OpenTelemetry is +needed in each language, which can be similar to the OpenCensus implementation. Go already has community support for that. -Users can provide a single composite propagator that combines one or multiple `TextMapPropagator` -for their client and server separately. This way, users can define their own +Users can provide a single composite propagator that combines one or multiple `TextMapPropagator` +for their client and server separately. This way, users can define their own migration path for context propagators in distributed components, see detailed -discussion in the later session. Configuring gRPC OpenTelemetry with this +discussion later. Configuring gRPC OpenTelemetry with this propagator when dealing with cross-process concerns during migration is straightforward and recommended. In the long term, community -standardized propagators, e.g. W3C is more encouraged than `GrpcTraceBinPropagator`. +standardized propagators, e.g. W3C is more encouraged than `GrpcTraceBinPropagator`. -#### Propagator API in Java/Go -The pseudocode below demonstrates `GrpcTraceBinPropagator` and the corresponding +#### GrpcTraceBinPropagator and TextMapGetter/Setter in Java/Go +The pseudocode below demonstrates `GrpcTraceBinPropagator` and the corresponding gRPC Getter/Setter with an optimization path. ```Java @@ -267,7 +350,7 @@ public class GrpcTraceBinPropagator implements TextMapPropagator { ``` -The `GrpcTraceBinPropagator` should be compatible with any Getter/Setter, but +The `GrpcTraceBinPropagator` should be compatible with any Getter/Setter, but internally in gRPC, in Java and Go we implement a special gRPC Getter/Setter that uses an optimization path to work around the lack of binary propagator API and thus avoid base64 (de)encoding when passing data between API interfaces. @@ -334,22 +417,22 @@ private interface GrpcBinarySetter { ``` -The `GrpcCommonSetter` adds an overloaded `set()` method to directly take `byte[]` -(Java and Go) to avoid extra base64 encoding. For the normal `set()` method it -should handle both binary (`-bin`) header and ASCII header from any TextMap +The `GrpcCommonSetter` adds an overloaded `set()` method to directly take `byte[]` +(Java and Go) to avoid extra base64 encoding. For the normal `set()` method it +should handle both binary (`-bin`) header and ASCII header from any TextMap propagators that users may configure. -The `GrpcCommonGetter` in Java and Go adds a new method `getBinary()` for the +The `GrpcCommonGetter` in Java and Go adds a new method `getBinary()` for the optimized path for the same reason. Similarly, the normal `get()` method handles both binary headers and TextMap propagators. -#### Context Propagation APIs in C++ +#### GrpcTraceBinPropagator and TextMapCarrier in C++ C++ will also support propagator APIs to provides API uniformity among -languages. Since gRPC C++ avoids Run-time type information (RTTI), it can not -use the same optimization path as Java/Go. This will result in an extra base64 +languages. Since gRPC C++ avoids Run-time type information (RTTI), it can not +use the same optimization path as Java/Go. This will result in an extra base64 encoding/decoding step to satisfy `TextMapPropagator` requirement that the key/value pair be a valid HTTP field. There are possible optimizations C++ might -pursue in the future, for example, providing an explicit knob on -`GrpcTraceBinTextMapPropagator` that assumes that this propagator is being used +pursue in the future, for example, providing an explicit knob on +`GrpcTraceBinTextMapPropagator` that assumes that this propagator is being used with gRPC and can hence skirt `TextMapPropagator` compatibility requirements. ```C++ @@ -462,37 +545,6 @@ MakeGrpcTraceBinTextMapPropagator() { ``` - -## Migrate from OpenCensus to OpenTelemetry - -### gRPC Census API -The existing gRPC OpenCensus tracing APIs in grpc-census plugin are different between -languages, e.g. in grpc-java it is zero-configuration: as long as the grpc-census -dependency exists in the classpath, the traces are automatically generated. -In C++, there is an API for users to call to enable tracing. In Go, it is exposed via stream tracers. -We keep the grpc-census plugin to allow users who already depend on grpc-census to -continue using it for newer grpc versions. - -gRPC users depending on the grpc-census plugin have non-trivial migration paths -to OpenTelemetry. Consider the following use cases: -1. Compatibility between a gRPC client and server as two distributed components, -where during migration one will use OpenCensus and the other will use OpenTelemetry. -2. Migrate an application binary where both OpenCensus and OpenTelemetry maybe exist - in the dependency tree. This can be the application’s own tracing code, or gRPC - OpenCensus, or other dependencies that involve OpenCensus and/or OpenTelemetry. - -Here are the suggested solutions for both use cases. - -### Migrate to OpenTelemetry: Cross-process Networking Concerns -When users first introduce gRPC OpenTelemetry, for the time window when the -gRPC client and server have mixed plugins of OpenTelemetry and OpenCensus, -it is encouraged to use `GrpcTraceBinPropagator`, as described in the previous section, -that propagates `grpc-trace-bin` header for the migration. -When using `grpc-trace-bin` the OpenCensus spanContext and -OpenTelemetry spanContext are identical, therefore a gRPC OpenCensus client can -speak with a gRPC OpenTelemetry server and vice versa. Using the same header -greatly simplifies rollout. - After migration period, users have the flexibility to switch to other propagators. OpenTelemetry and its extension packages support multiple text map propagators. The gRPC OpenTelemetry API allows specifying @@ -542,7 +594,7 @@ Finally, they switch to grpc-open-telemetry and finish the migration. |-- gRPC -> Using Otel to generate Trace A ----------------------------- | |-- Application -> Using Otel to generate a sub Trace B------------------ | ``` -### Tracing Information +### OpenCensus vs OpenTelemetry Tracing Information Mapping gRPC is generating similar tracing information for OpenTelemetry compared with OpenCensus, but due to API differences between those two libraries, the trace information is represented slightly differently. @@ -565,7 +617,6 @@ annotation attributes keys are mapped to event attributes keys: | `type` | `message.event.type` | | `id` | `message.message.id` | - ## Rationale C++ will not have the optimization path in its `GrpcTraceBinPropagator` API. We considered to have an API that enables adding `grpc-trace-bin` to the metadata From 2d89763a222cc66c86f19d582f385e1810f13016 Mon Sep 17 00:00:00 2001 From: yifeizhuang Date: Thu, 21 Mar 2024 11:22:59 -0700 Subject: [PATCH 18/30] improve languages, fix tracing info uncompressed message --- A72-open-telemetry-tracing.md | 109 +++++++++++++++++----------------- 1 file changed, 54 insertions(+), 55 deletions(-) diff --git a/A72-open-telemetry-tracing.md b/A72-open-telemetry-tracing.md index e66551011..76bbed398 100644 --- a/A72-open-telemetry-tracing.md +++ b/A72-open-telemetry-tracing.md @@ -58,12 +58,12 @@ by encoding them in metadata, for the following benefits: In order for the propagator to perform injecting and extracting spanContext value from the carrier, which is the Metadata in gRPC, languages will -implement Getter and Setter corresponding to the propagator. +implement Getter and Setter corresponding to the propagator type. Currently, OpenTelemetry propagator API only supports `TextMapPropagator`, that is to send string key/value pairs between the client and server. -Therefore, to implement Getter and Setter is to implement the TextMap carrier interface: -`TextMapCarrier` (For C++/Go), or `TextMapGetter`/`TextMapSetter` (For Java), see -pseudocode in later sections. +Therefore, adding Getter and Setter is to implement the TextMap carrier interface: +`TextMapCarrier` (For C++/Go), or `TextMapGetter`/`TextMapSetter` (For Java). (see +pseudocode in section [Migrate to OpenTelemetry](#migrate-to-opentelemetry--cross-process-networking-concerns)). The APIs to enable and configure OpenTelemetry tracing are different among languages due to different underlying infrastructures. @@ -194,7 +194,9 @@ At the client, on parent span: * If the RPC experienced name resolution delay, add an Event at the start of the call with the name "Delayed name resolution complete" upon completion of the name resolution process. -* When the call is closed, set RPC status and end the parent span. +* When the call is closed, set RPC status and end the parent span. gRPC status "OK" + is recorded with status "OK", while other gRPC statuses are marked as "ERROR". + Non-"OK" statuses include their code as a description, the same below. On attempt span: * When span is created, set the attribute with key `previous-rpc-attempts` and an @@ -204,42 +206,40 @@ On attempt span: * If the RPC experienced load balancer pick delay, add an Event with the name "Delayed LB pick complete" upon creation of the stream on the transport. * When the application sends an outbound message to the transport, add an Event - with the following attributes: - * key `message.event.type` with String value "SENT" - * key `message.message.id` with integer value seq no. The seq no. is a sequence - of integer numbers starting from 0 to identify sent messages within the stream + with name "Outbound message sent" and the following attributes: + * key `message.event.type` with string value "SENT". + * key `message.message.id` with integer value of the seq no. The seq no. is a sequence + of integer numbers starting from 0 to identify sent messages within the stream, the same below. * key `message.event.size.uncompressed` with integer value of uncompressed message size. The size is the total attempt message bytes without encryption, - not including grpc or transport framing bytes. + not including grpc or transport framing bytes, the same below. * If any compression, key `message.event.size.compressed` with integer value of compressed message size. * When an inbound message has been received from the transport, add an Event - with the following attributes: - * key `message.event.type` with String value "RECEIVED" - * key `message.message.id` with integer value seq no. - * key `message.event.size.uncompressed` with integer value of wire message. - * If any compression, key `message.event.size.compressed` with integer value - of compressed message size. + with name "Inbound message read" and the following attributes: + * key `message.event.type` with String value "RECEIVED". + * key `message.message.id` with integer value of the seq no. + * key `message.event.size.compressed` with integer value of wire message size. + * If any compression, key `message.event.size.uncompressed` with integer value + of uncompressed message size. * When the stream is closed, set RPC status and end the attempt span. At the server: * When the application sends an outbound message to the transport, add an Event - with the following attributes: - * key `message.event.type` with String value "SENT" - * key `message.message.id` with integer value seq no. The seq no. is a sequence - of integer numbers starting from 0 to identify sent messages within the stream + with name "Outbound message sent" and the following attributes: + * key `message.event.type` with string value "SENT". + * key `message.message.id` with integer value of the seq no. * key `message.event.size.uncompressed` with integer value of uncompressed - message size. The size is the total attempt message bytes without encryption, - not including grpc or transport framing bytes. + message size. * If any compression, key `message.event.size.compressed` with integer value of compressed message size. * When an inbound message has been read from the transport, add an Event - with the following attributes: - * key `message.event.type` with String value "RECEIVED" - * key `message.message.id` with integer value seq no. - * key `message.event.size.compressed` with integer value of wire message. - * If any decompression, key `message.event.size.uncompressed` with integer value - of compressed message size. + with name "Inbound message read" and the following attributes: + * key `message.event.type` with string value "RECEIVED". + * key `message.message.id` with integer value of the seq no. + * key `message.event.size.compressed` with integer value of wire message size. + * If any compression, key `message.event.size.uncompressed` with integer value + of uncompressed message size. * When the stream is closed, set the RPC status and end the span. ### Limitations @@ -248,12 +248,12 @@ While it's not critical, we can include these information if users request it in Java has an issue of reporting decompressed message size upon receiving messages, as a workaround, on the client parent span and server span: -* When the uncompressed size of some outbound data is revealed, add an Event - with the following attributes: - * key `message.event.type` with String value "RECEIVED" - * key `message.message.id` with integer value seq no. +* When the uncompressed size of some inbound data is revealed, add an Event + with name "Inbound message read" and the following attributes: + * key `message.event.type` with string value "RECEIVED". + * key `message.message.id` with integer value of the seq no. * key `message.event.size.uncompressed` with integer value - of uncompressed message size + of uncompressed message size. ## Migrate from OpenCensus to OpenTelemetry @@ -278,10 +278,11 @@ Here are the suggested solutions for both use cases. ### Migrate to OpenTelemetry: Cross-process Networking Concerns When users first introduce gRPC OpenTelemetry, for the time window when the -gRPC client and server have mixed plugins of OpenTelemetry and OpenCensus. -To tackle this, gRPC will expose a custom `GrpcTraceBinPropagator` that implements -`TextMapPropagator`. This will allow gRPC to keep using `grpc-trace-bin` header -for context propagation and also support other propagators. +gRPC client and server have mixed plugins of OpenTelemetry and OpenCensus, +spanContext can not directly propagate due to different header name and wire format. +To tackle this, gRPC will expose a custom `GrpcTraceBinPropagator` +that implements `TextMapPropagator`. This will allow gRPC to keep using `grpc-trace-bin` +header for context propagation and also support other propagators. When using `grpc-trace-bin` the OpenCensus spanContext and OpenTelemetry spanContext are identical, therefore a gRPC OpenCensus client can speak with a gRPC OpenTelemetry server and vice versa. It is encouraged to use `GrpcTraceBinPropagator` for the migration. @@ -303,13 +304,7 @@ A `grpc-trace-bin` formatter implementation for OpenTelemetry is needed in each language, which can be similar to the OpenCensus implementation. Go already has community support for that. -Users can provide a single composite propagator that combines one or multiple `TextMapPropagator` -for their client and server separately. This way, users can define their own -migration path for context propagators in distributed components, see detailed -discussion later. Configuring gRPC OpenTelemetry with this -propagator when dealing with cross-process concerns during migration is -straightforward and recommended. In the long term, community -standardized propagators, e.g. W3C is more encouraged than `GrpcTraceBinPropagator`. + #### GrpcTraceBinPropagator and TextMapGetter/Setter in Java/Go The pseudocode below demonstrates `GrpcTraceBinPropagator` and the corresponding @@ -545,13 +540,16 @@ MakeGrpcTraceBinTextMapPropagator() { ``` -After migration period, users have the flexibility to switch to other propagators. -OpenTelemetry and its extension packages support multiple text map propagators. -The gRPC OpenTelemetry API allows specifying -multiple propagators: either public standard ones or custom propagators that -implement the OpenTelemetry propagator API interface. The API composites the -propagators and gRPC puts all the propagator data into the wire through metadata. -This allows users to easily migrate a group of applications with an old propagator to +With gRPC OpenTelemetry API, users can provide a single composite propagator that +combines one or multiple `TextMapPropagator` for their client and server separately. +OpenTelemetry and its extension packages support multiple text map propagators. +gRPC puts all the propagator data into the wire through metadata, and receives all the +data specified from the propagator configuration. +Users can define their own migration path for context propagators in distributed components. +Configuring gRPC OpenTelemetry with this propagator when dealing with +cross-process concerns during migration is straightforward and recommended. +In the long term, community standardized propagators, e.g. W3C is more encouraged than `GrpcTraceBinPropagator`. +This also allows users to easily migrate a group of applications with an old propagator to a new propagator. An example migration path can be: 1. Configure server to accept both old and new propagators. 2. Configure the client with the desired new propagators and to drop the old propagator. @@ -594,6 +592,7 @@ Finally, they switch to grpc-open-telemetry and finish the migration. |-- gRPC -> Using Otel to generate Trace A ----------------------------- | |-- Application -> Using Otel to generate a sub Trace B------------------ | ``` + ### OpenCensus vs OpenTelemetry Tracing Information Mapping gRPC is generating similar tracing information for OpenTelemetry compared with OpenCensus, but due to API differences between those two libraries, the @@ -604,10 +603,10 @@ mapped from OpenCensus `MessageEvent` fields: | OpenCensus Trace Message Event Fields | OpenTelemetry Trace Event Attribute Key | |---------------------------------------|-----------------------------------------| -| Type | `message.event.type` | -| Message Id | `message.message.id` | -| Uncompressed message size | `message.event.size.uncompressed` | -| Compressed message size | `message.event.size.compressed` | +| `Type` | `message.event.type` | +| `Message Id` | `message.message.id` | +| `Uncompressed message size` | `message.event.size.uncompressed` | +| `Compressed message size` | `message.event.size.compressed` | OpenCensus span annotation description maps to OpenTelemetry event name, and annotation attributes keys are mapped to event attributes keys: From 032beee9d64272910f023b445ae221f6d5b23156 Mon Sep 17 00:00:00 2001 From: yifeizhuang Date: Thu, 21 Mar 2024 18:08:54 -0700 Subject: [PATCH 19/30] fix compression and message sizes, minor language fixes --- A72-open-telemetry-tracing.md | 131 +++++++++++++++++++--------------- 1 file changed, 72 insertions(+), 59 deletions(-) diff --git a/A72-open-telemetry-tracing.md b/A72-open-telemetry-tracing.md index 76bbed398..36795dafb 100644 --- a/A72-open-telemetry-tracing.md +++ b/A72-open-telemetry-tracing.md @@ -48,25 +48,8 @@ tracing instrumentation. We will add tracing functions in grpc-open-telemetry plugin, along with OpenTelemetry metrics [gRFC A66][A66]. Internally, the tracing functionality will be implemented using existing gRPC infrastructure such as interceptors and stream tracers. - -#### Propagator Wire Format -gRPC OpenTelemetry will use the existing OpenTelemetry propagators API for context propagation -by encoding them in metadata, for the following benefits: -1. Full integration with OpenTelemetry APIs that is easier for users to reason about. -2. Make it possible to plugin other propagators that the community supports. -3. Flexible API that allows clean and simple migration paths to a different propagator. - -In order for the propagator to perform injecting and extracting spanContext value -from the carrier, which is the Metadata in gRPC, languages will -implement Getter and Setter corresponding to the propagator type. -Currently, OpenTelemetry propagator API only supports `TextMapPropagator`, -that is to send string key/value pairs between the client and server. -Therefore, adding Getter and Setter is to implement the TextMap carrier interface: -`TextMapCarrier` (For C++/Go), or `TextMapGetter`/`TextMapSetter` (For Java). (see -pseudocode in section [Migrate to OpenTelemetry](#migrate-to-opentelemetry--cross-process-networking-concerns)). - - The APIs to enable and configure OpenTelemetry tracing are different among - languages due to different underlying infrastructures. +The APIs to enable and configure OpenTelemetry tracing are different among +languages due to different underlying infrastructures. ### Java In Java, it will be part of global interceptors, so that the interceptors are @@ -152,7 +135,7 @@ type TraceOptions struct { TraceProvider trace.TraceProvider } -// DialOption returns a dial option which enables OpenCensus instrumentation +// DialOption returns a dial option which enables OpenTelemetry instrumentation // code for a grpc.ClientConn. // // Client applications interested in instrumenting their grpc.ClientConn should @@ -164,8 +147,7 @@ type TraceOptions struct { // TraceOption. Client side has retries, so a Unary and Streaming Interceptor are // registered to handle per RPC traces, and a Stats Handler is registered to handle // per RPC attempt trace. These three components registered work together in -// conjunction, and do not work standalone. It is not supported to use this -// alongside another stats handler dial option. +// conjunction, and do not work standalone. func DialOption(to TraceOptions) grpc.DialOption {} // ServerOption returns a server option which enables OpenTelemetry @@ -178,14 +160,13 @@ func DialOption(to TraceOptions) grpc.DialOption {} // Using this option will always lead to instrumentation, however in order to // use the data a SpanExporter must be registered with the TraceProvider option. // Server side does not have retries, so a registered Stats Handler is the only -// option that is returned. It is not supported to use this alongside another -// stats handler server option. +// option that is returned. func ServerOption(to TraceOptions) grpc.ServerOption {} ``` ## Tracing Information RPCs on the client side may undergo retry attempts, whereas on the server side, -they do not. gRPC records both per-call tracing details (at the parent span) +they do not. gRPC records both per-call tracing details (on the parent span) and per-attempt tracing details (on the attempt span) on the client side. On the server side, there is only per-call traces. With the new OpenTelemetry plugin we will produce the following tracing information during an RPC lifecycle: @@ -208,54 +189,87 @@ On attempt span: * When the application sends an outbound message to the transport, add an Event with name "Outbound message sent" and the following attributes: * key `message.event.type` with string value "SENT". - * key `message.message.id` with integer value of the seq no. The seq no. is a sequence - of integer numbers starting from 0 to identify sent messages within the stream, the same below. + * key `message.message.id` with integer value of the seq no. The seq no. indicates + the order of the sent messages on the attempt (i.e., it starts at 0 and is + incremented by 1 for each message sent), the same below. * key `message.event.size.uncompressed` with integer value of uncompressed message size. The size is the total attempt message bytes without encryption, not including grpc or transport framing bytes, the same below. - * If any compression, key `message.event.size.compressed` with integer value - of compressed message size. +* If the message needs compression, add an Event with name "Outbound message compressed" + when the compression is done with the following attributes: + * key `message.event.type` with string value "SENT". + * key `message.message.id` with integer value of the seq no. + * key `message.event.size.compressed` with integer value of compressed message size. * When an inbound message has been received from the transport, add an Event with name "Inbound message read" and the following attributes: - * key `message.event.type` with String value "RECEIVED". + * key `message.event.type` with string value "RECEIVED". + * key `message.message.id` with integer value of the seq no. The seq no. indicates + the order of the received messages on the attempt (i.e., it starts at 0 and is + incremented by 1 for each message received), the same below. + * key `message.event.size.compressed` with integer value of wire message size. +* If the inbound message was compressed, add an Event with name "Inbound message uncompressed" + when the decompression is done with the following attributes: + * key `message.event.type` with string value "RECEIVED". * key `message.message.id` with integer value of the seq no. - * key `message.event.size.compressed` with integer value of wire message size. - * If any compression, key `message.event.size.uncompressed` with integer value - of uncompressed message size. + * key `message.event.size.uncompressed` with integer value of uncompressed message size. * When the stream is closed, set RPC status and end the attempt span. At the server: * When the application sends an outbound message to the transport, add an Event with name "Outbound message sent" and the following attributes: * key `message.event.type` with string value "SENT". - * key `message.message.id` with integer value of the seq no. + * key `message.message.id` with integer value of the seq no. The seq no. indicates + the order of the sent messages on the attempt (i.e., it starts at 0 and is + incremented by 1 for each message sent), the same below. * key `message.event.size.uncompressed` with integer value of uncompressed - message size. - * If any compression, key `message.event.size.compressed` with integer value - of compressed message size. -* When an inbound message has been read from the transport, add an Event + message size. The size is the total attempt message bytes without encryption, + not including grpc or transport framing bytes, the same below. +* If the message needs compression, add an Event with name "Outbound message compressed" + when the compression is done with the following attributes: + * key `message.event.type` with string value "SENT". + * key `message.message.id` with integer value of the seq no. + * key `message.event.size.compressed` with integer value of compressed message size. +* When an inbound message has been received from the transport, add an Event with name "Inbound message read" and the following attributes: * key `message.event.type` with string value "RECEIVED". - * key `message.message.id` with integer value of the seq no. + * key `message.message.id` with integer value of the seq no. The seq no. indicates + the order of the received messages on the attempt (i.e., it starts at 0 and is + incremented by 1 for each message received), the same below. * key `message.event.size.compressed` with integer value of wire message size. - * If any compression, key `message.event.size.uncompressed` with integer value - of uncompressed message size. +* If the inbound message was compressed, add an Event with name "Inbound message uncompressed" + when the decompression is done with the following attributes: + * key `message.event.type` with string value "RECEIVED". + * key `message.message.id` with integer value of the seq no. + * key `message.event.size.uncompressed` with integer value of uncompressed message size. * When the stream is closed, set the RPC status and end the span. ### Limitations Note that C++ is missing the seq no. information due to lack of transport support. While it's not critical, we can include these information if users request it in the future. -Java has an issue of reporting decompressed message size upon receiving messages, -as a workaround, on the client parent span and server span: -* When the uncompressed size of some inbound data is revealed, add an Event - with name "Inbound message read" and the following attributes: - * key `message.event.type` with string value "RECEIVED". - * key `message.message.id` with integer value of the seq no. - * key `message.event.size.uncompressed` with integer value - of uncompressed message size. +Implementations that cannot report separate events for the inbound uncompressed +message size on each attempt will report once at the parent span. +Implementations that can not report uncompressed outbound message size before compression +will report a single event that has compressed and uncompressed size attributes. -## Migrate from OpenCensus to OpenTelemetry +## Propagator Wire Format +gRPC OpenTelemetry will use the existing OpenTelemetry propagators API for context propagation +by encoding them in metadata, for the following benefits: +1. Full integration with OpenTelemetry APIs that is easier for users to reason about. +2. Make it possible to plugin other propagators that the community supports. +3. Flexible API that allows clean and simple migration paths to a different propagator. + +We will have OpenTelemetry propagator APIs for context propagation. +In order for the propagator to perform injecting and extracting spanContext value +from the carrier, which is the Metadata in gRPC, languages will +implement Getter and Setter corresponding to the propagator type. +Currently, OpenTelemetry propagator API only supports `TextMapPropagator`, +that is to send string key/value pairs between the client and server. +Therefore, adding Getter and Setter is to implement the TextMap carrier interface: +`TextMapCarrier` (For C++/Go), or `TextMapGetter`/`TextMapSetter` (For Java). (see +pseudocode in section [Migration to OpenTelemetry: Cross-process Networking Concerns](#migration-to-opentelemetry--cross-process-networking-concerns)). + +## Migration from OpenCensus to OpenTelemetry ### gRPC OpenCensus API The existing gRPC OpenCensus tracing APIs in grpc-census plugin are different between @@ -276,7 +290,7 @@ where during migration one will use OpenCensus and the other will use OpenTeleme Here are the suggested solutions for both use cases. -### Migrate to OpenTelemetry: Cross-process Networking Concerns +### Migration to OpenTelemetry: Cross-process Networking Concerns When users first introduce gRPC OpenTelemetry, for the time window when the gRPC client and server have mixed plugins of OpenTelemetry and OpenCensus, spanContext can not directly propagate due to different header name and wire format. @@ -295,17 +309,16 @@ different from the binary header that gRPC currently uses. The future roadmap to support binary propagators at OpenTelemetry is unclear. So, gRPC will use propagator API in TextMap format with an optimization path (Go and Java) to work around the lack of binary propagator API to support `grpc-trace-bin`. In fact, -TextMap propagator does not show visible performance impact for C++, which is -the most sensitive language to performance, based on internal micro benchmarking. -Therefore, gRPC will only support propagating `grpc-trace-bin` in TextMap propagator. -Only one `grpc-trace-bin` header will be sent for a single RPC as long as only one of -OpenTelemetry or OpenCensus is enabled for the channel. +TextMap propagator is a viable alternative to the existing binary format in gRPC +in terms of performance, based on internal C++ micro benchmarking on W3C TextMap +propagator. If this posed a performance problem for users, we can consider +implementing an alternative API in C++, see [Rationale](#rationale). +Only one `grpc-trace-bin` header will be sent for a single RPC as long as only +one of OpenTelemetry or OpenCensus is enabled for the channel. A `grpc-trace-bin` formatter implementation for OpenTelemetry is needed in each language, which can be similar to the OpenCensus implementation. Go already has community support for that. - - #### GrpcTraceBinPropagator and TextMapGetter/Setter in Java/Go The pseudocode below demonstrates `GrpcTraceBinPropagator` and the corresponding gRPC Getter/Setter with an optimization path. @@ -555,7 +568,7 @@ a new propagator. An example migration path can be: 2. Configure the client with the desired new propagators and to drop the old propagator. 3. Make the server only accept the new propagators and complete the migration. -### Migrate to OpenTelemetry: In Binary +### Migration to OpenTelemetry: In Binary The OpenCensus [shim](https://github.com/open-telemetry/opentelemetry-java/tree/main/opencensus-shim) (currently available in Java, Go, Python) allows binaries that have a mix of OpenTelemetry and OpenCensus dependencies to export trace spans from both From 027582030162e72072af52fc1f47c52f52eee7f3 Mon Sep 17 00:00:00 2001 From: yifeizhuang Date: Wed, 27 Mar 2024 10:06:32 -0700 Subject: [PATCH 20/30] decision on reporting compressed/uncompressed message size, and seq id --- A72-open-telemetry-tracing.md | 65 ++++++++++++++--------------------- 1 file changed, 25 insertions(+), 40 deletions(-) diff --git a/A72-open-telemetry-tracing.md b/A72-open-telemetry-tracing.md index 36795dafb..a7f2f1ad8 100644 --- a/A72-open-telemetry-tracing.md +++ b/A72-open-telemetry-tracing.md @@ -186,7 +186,9 @@ On attempt span: boolean value indicating whether the stream is undergoing a transparent retry. * If the RPC experienced load balancer pick delay, add an Event with the name "Delayed LB pick complete" upon creation of the stream on the transport. -* When the application sends an outbound message to the transport, add an Event +* When the application sends an outbound message to the transport, add Event(s) + (it depends on implementation whether there is a single event or two separate + events for compressed/uncompressed message sizes, the same below) with name "Outbound message sent" and the following attributes: * key `message.event.type` with string value "SENT". * key `message.message.id` with integer value of the seq no. The seq no. indicates @@ -195,62 +197,45 @@ On attempt span: * key `message.event.size.uncompressed` with integer value of uncompressed message size. The size is the total attempt message bytes without encryption, not including grpc or transport framing bytes, the same below. -* If the message needs compression, add an Event with name "Outbound message compressed" - when the compression is done with the following attributes: - * key `message.event.type` with string value "SENT". - * key `message.message.id` with integer value of the seq no. - * key `message.event.size.compressed` with integer value of compressed message size. -* When an inbound message has been received from the transport, add an Event - with name "Inbound message read" and the following attributes: + * If compression needed, add key `message.event.size.compressed` with integer + value of compressed message size. +* When an inbound message has been received from wire, add Event(s) with name + "Inbound message read" and the following attributes: * key `message.event.type` with string value "RECEIVED". * key `message.message.id` with integer value of the seq no. The seq no. indicates the order of the received messages on the attempt (i.e., it starts at 0 and is incremented by 1 for each message received), the same below. * key `message.event.size.compressed` with integer value of wire message size. -* If the inbound message was compressed, add an Event with name "Inbound message uncompressed" - when the decompression is done with the following attributes: - * key `message.event.type` with string value "RECEIVED". - * key `message.message.id` with integer value of the seq no. - * key `message.event.size.uncompressed` with integer value of uncompressed message size. + * If the message needs decompression, add key `message.event.size.uncompressed` + with integer value of uncompressed message size. * When the stream is closed, set RPC status and end the attempt span. At the server: -* When the application sends an outbound message to the transport, add an Event +* When the application sends an outbound message to the transport, add Event(s) with name "Outbound message sent" and the following attributes: * key `message.event.type` with string value "SENT". - * key `message.message.id` with integer value of the seq no. The seq no. indicates - the order of the sent messages on the attempt (i.e., it starts at 0 and is - incremented by 1 for each message sent), the same below. - * key `message.event.size.uncompressed` with integer value of uncompressed - message size. The size is the total attempt message bytes without encryption, - not including grpc or transport framing bytes, the same below. -* If the message needs compression, add an Event with name "Outbound message compressed" - when the compression is done with the following attributes: - * key `message.event.type` with string value "SENT". * key `message.message.id` with integer value of the seq no. - * key `message.event.size.compressed` with integer value of compressed message size. -* When an inbound message has been received from the transport, add an Event - with name "Inbound message read" and the following attributes: - * key `message.event.type` with string value "RECEIVED". - * key `message.message.id` with integer value of the seq no. The seq no. indicates - the order of the received messages on the attempt (i.e., it starts at 0 and is - incremented by 1 for each message received), the same below. - * key `message.event.size.compressed` with integer value of wire message size. -* If the inbound message was compressed, add an Event with name "Inbound message uncompressed" - when the decompression is done with the following attributes: + * key `message.event.size.uncompressed` with integer value of uncompressed + message size. + * If compression needed, add key `message.event.size.compressed` with integer + value of compressed message size. +* When an inbound message has been received from wire, add Event(s) with name + "Inbound message read" and the following attributes: * key `message.event.type` with string value "RECEIVED". * key `message.message.id` with integer value of the seq no. - * key `message.event.size.uncompressed` with integer value of uncompressed message size. + * key `message.event.size.compressed` with integer value of wire message size. + * If the message needs decompression, add key `message.event.size.uncompressed` + with integer value of uncompressed message size. * When the stream is closed, set the RPC status and end the span. ### Limitations -Note that C++ is missing the seq no. information due to lack of transport support. -While it's not critical, we can include these information if users request it in the future. +The timestamp information on the Events that report compressed/uncompressed message +sizes are not accurate or useful. It only gives you a relative order with other Events. +We can tighten the timing in the future if users find this information critical. -Implementations that cannot report separate events for the inbound uncompressed -message size on each attempt will report once at the parent span. -Implementations that can not report uncompressed outbound message size before compression -will report a single event that has compressed and uncompressed size attributes. +Java has an open issue of reporting uncompressed message size upon receiving message. +It does that at a later time when deserializing. Therefore, at the client Java only +reports the uncompressed message size for incoming messages on parent span, not attempt span. ## Propagator Wire Format gRPC OpenTelemetry will use the existing OpenTelemetry propagators API for context propagation From 3e263739e7afb5fba2bb26c3bd89e6dccc122453 Mon Sep 17 00:00:00 2001 From: yifeizhuang Date: Fri, 5 Apr 2024 15:24:13 -0700 Subject: [PATCH 21/30] clarify the second message for compressed/uncompressed message name, remove to transport and form wire --- A72-open-telemetry-tracing.md | 43 +++++++++++++++++++++++------------ 1 file changed, 28 insertions(+), 15 deletions(-) diff --git a/A72-open-telemetry-tracing.md b/A72-open-telemetry-tracing.md index a7f2f1ad8..ca44625a6 100644 --- a/A72-open-telemetry-tracing.md +++ b/A72-open-telemetry-tracing.md @@ -186,46 +186,59 @@ On attempt span: boolean value indicating whether the stream is undergoing a transparent retry. * If the RPC experienced load balancer pick delay, add an Event with the name "Delayed LB pick complete" upon creation of the stream on the transport. -* When the application sends an outbound message to the transport, add Event(s) - (it depends on implementation whether there is a single event or two separate - events for compressed/uncompressed message sizes, the same below) +* When the application sends an outbound message, add Event(s) + (it depends on implementation whether there is a single event or an additional + separate event with name "Outbound message compressed" for compressed message size) with name "Outbound message sent" and the following attributes: * key `message.event.type` with string value "SENT". * key `message.message.id` with integer value of the seq no. The seq no. indicates the order of the sent messages on the attempt (i.e., it starts at 0 and is incremented by 1 for each message sent), the same below. - * key `message.event.size.uncompressed` with integer value of uncompressed + * key `message.event.size` with integer value of uncompressed message size. The size is the total attempt message bytes without encryption, not including grpc or transport framing bytes, the same below. * If compression needed, add key `message.event.size.compressed` with integer - value of compressed message size. -* When an inbound message has been received from wire, add Event(s) with name + value of compressed message size. If this is reported as a separate event in + an implementation, the event name is "Outbound message compressed" and the + order of the event must be after the first event that reports the message size. +* When an inbound message has been received, add Event(s) (it depends on + implementation whether there is a single event or an additional separate event + with name "Inbound message uncompressed" for uncompressed message size) with name "Inbound message read" and the following attributes: * key `message.event.type` with string value "RECEIVED". * key `message.message.id` with integer value of the seq no. The seq no. indicates the order of the received messages on the attempt (i.e., it starts at 0 and is incremented by 1 for each message received), the same below. - * key `message.event.size.compressed` with integer value of wire message size. + * key `message.event.size` with integer value of wire message size. * If the message needs decompression, add key `message.event.size.uncompressed` - with integer value of uncompressed message size. + with integer value of uncompressed message size. If this is reported as a + separate event in an implementation, the event name is "Inbound message uncompressed" + and the order of the event must be after the first event that reports the wire message size. * When the stream is closed, set RPC status and end the attempt span. At the server: -* When the application sends an outbound message to the transport, add Event(s) +* When the application sends an outbound message, add Event(s) + (it depends on implementation whether there is a single event or an additional + separate event with name "Outbound message compressed" for compressed message size) with name "Outbound message sent" and the following attributes: * key `message.event.type` with string value "SENT". * key `message.message.id` with integer value of the seq no. - * key `message.event.size.uncompressed` with integer value of uncompressed - message size. + * key `message.event.size` with integer value of uncompressed message size. * If compression needed, add key `message.event.size.compressed` with integer - value of compressed message size. -* When an inbound message has been received from wire, add Event(s) with name + value of compressed message size. If this is reported as a separate event in + an implementation, the event name is "Outbound message compressed" and the + order of the event must be after the first event that reports the message size. +* When an inbound message has been received, add Event(s) (it depends on + implementation whether there is a single event or an additional separate event + with name "Inbound message uncompressed" for uncompressed message size) with name "Inbound message read" and the following attributes: * key `message.event.type` with string value "RECEIVED". * key `message.message.id` with integer value of the seq no. - * key `message.event.size.compressed` with integer value of wire message size. + * key `message.event.size` with integer value of wire message size. * If the message needs decompression, add key `message.event.size.uncompressed` - with integer value of uncompressed message size. + with integer value of uncompressed message size. If this is reported as a + separate event in an implementation, the event name is "Inbound message uncompressed" + and the order of the event must be after the first event that reports the wire message size. * When the stream is closed, set the RPC status and end the span. ### Limitations From 3885cd844c6e2e139c7bbc019909f6a380804a4f Mon Sep 17 00:00:00 2001 From: yifeizhuang Date: Tue, 9 Apr 2024 17:47:54 -0700 Subject: [PATCH 22/30] scheme for the name of uncompressed message size outbound, and compressed message size inbound --- A72-open-telemetry-tracing.md | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/A72-open-telemetry-tracing.md b/A72-open-telemetry-tracing.md index ca44625a6..3a667e347 100644 --- a/A72-open-telemetry-tracing.md +++ b/A72-open-telemetry-tracing.md @@ -194,7 +194,8 @@ On attempt span: * key `message.message.id` with integer value of the seq no. The seq no. indicates the order of the sent messages on the attempt (i.e., it starts at 0 and is incremented by 1 for each message sent), the same below. - * key `message.event.size` with integer value of uncompressed + * key named `message.event.size.uncompressed` if the message needs compression, + otherwise named `message.event.size`, with integer value of uncompressed message size. The size is the total attempt message bytes without encryption, not including grpc or transport framing bytes, the same below. * If compression needed, add key `message.event.size.compressed` with integer @@ -209,7 +210,8 @@ On attempt span: * key `message.message.id` with integer value of the seq no. The seq no. indicates the order of the received messages on the attempt (i.e., it starts at 0 and is incremented by 1 for each message received), the same below. - * key `message.event.size` with integer value of wire message size. + * key named `message.event.size.compressed` if the message needs decompression, + otherwise named `message.event.size`, with integer value of wire message size. * If the message needs decompression, add key `message.event.size.uncompressed` with integer value of uncompressed message size. If this is reported as a separate event in an implementation, the event name is "Inbound message uncompressed" @@ -223,7 +225,9 @@ At the server: with name "Outbound message sent" and the following attributes: * key `message.event.type` with string value "SENT". * key `message.message.id` with integer value of the seq no. - * key `message.event.size` with integer value of uncompressed message size. + * key named `message.event.size.uncompressed` if the message needs compression, + otherwise named `message.event.size`, with integer value of uncompressed + message size. * If compression needed, add key `message.event.size.compressed` with integer value of compressed message size. If this is reported as a separate event in an implementation, the event name is "Outbound message compressed" and the @@ -234,7 +238,8 @@ At the server: "Inbound message read" and the following attributes: * key `message.event.type` with string value "RECEIVED". * key `message.message.id` with integer value of the seq no. - * key `message.event.size` with integer value of wire message size. + * key named `message.event.size.compressed` if the message needs decompression, + otherwise named `message.event.size`, with integer value of wire message size. * If the message needs decompression, add key `message.event.size.uncompressed` with integer value of uncompressed message size. If this is reported as a separate event in an implementation, the event name is "Inbound message uncompressed" From c866de589fa65a2ebccc87dabd42dd195258795f Mon Sep 17 00:00:00 2001 From: Xuan Wang Date: Wed, 10 Apr 2024 21:58:37 +0000 Subject: [PATCH 23/30] Add Python draft API --- A72-open-telemetry-tracing.md | 27 +++++++++++++++++++++++++-- 1 file changed, 25 insertions(+), 2 deletions(-) diff --git a/A72-open-telemetry-tracing.md b/A72-open-telemetry-tracing.md index 3a667e347..5f6600164 100644 --- a/A72-open-telemetry-tracing.md +++ b/A72-open-telemetry-tracing.md @@ -1,6 +1,6 @@ A72: OpenTelemetry Tracing ---- -* Author(s): [Yifei Zhuang](https://github.com/YifeiZhuang), [Yash Tibrewal](https://github.com/yashykt) +* Author(s): [Yifei Zhuang](https://github.com/YifeiZhuang), [Yash Tibrewal](https://github.com/yashykt), [Xuan Wang](https://github.com/XuanWang-Amos) * Approver: [Eric Anderson](https://github.com/ejona86) * Reviewers: [Mark Roth](https://github.com/markdroth), [Doug Fawley](https://github.com/dfawley), [Feng Li](https://github.com/fengli79) @@ -123,6 +123,28 @@ class OpenTelemetryPluginBuilder { }; ``` +``` + +### Python +The following new fields will be added in `OpenTelemetryPlugin`. + +```Python +from opentelemetry.sdk.trace import TracerProvider +from opentelemetry.trace.propagation.tracecontext import TraceContextTextMapPropagator + +class OpenTelemetryPlugin: + """ + If `tracer_provider` is None, no traces are collected. + """ + + def __init__( + self, + *, + tracer_provider: Optional[TracerProvider] = None, + text_map_propagator: [TraceContextTextMapPropagator] = None, + ): +``` + ### Go The following methods will be added to the opentelemetry package. ```go @@ -269,7 +291,8 @@ implement Getter and Setter corresponding to the propagator type. Currently, OpenTelemetry propagator API only supports `TextMapPropagator`, that is to send string key/value pairs between the client and server. Therefore, adding Getter and Setter is to implement the TextMap carrier interface: -`TextMapCarrier` (For C++/Go), or `TextMapGetter`/`TextMapSetter` (For Java). (see +`TextMapCarrier` (For C++/Go), `opentelemetry.propagators.textmap.Getter/Setter` (For Python) +or `TextMapGetter`/`TextMapSetter` (For Java). (see pseudocode in section [Migration to OpenTelemetry: Cross-process Networking Concerns](#migration-to-opentelemetry--cross-process-networking-concerns)). ## Migration from OpenCensus to OpenTelemetry From a346c2406ad7fe935dafc4c493dcdabe53b03b75 Mon Sep 17 00:00:00 2001 From: Xuan Wang Date: Wed, 10 Apr 2024 21:59:23 +0000 Subject: [PATCH 24/30] Fix format issue --- A72-open-telemetry-tracing.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/A72-open-telemetry-tracing.md b/A72-open-telemetry-tracing.md index 5f6600164..4622e6879 100644 --- a/A72-open-telemetry-tracing.md +++ b/A72-open-telemetry-tracing.md @@ -123,8 +123,6 @@ class OpenTelemetryPluginBuilder { }; ``` -``` - ### Python The following new fields will be added in `OpenTelemetryPlugin`. From 66b40d4f860e59060ef9186fe1d6422b8493407f Mon Sep 17 00:00:00 2001 From: yifeizhuang Date: Tue, 4 Jun 2024 09:07:04 -0700 Subject: [PATCH 25/30] change go/java api, remove mesage type, rename event attributs --- A72-open-telemetry-tracing.md | 140 +++++++++++----------------------- 1 file changed, 45 insertions(+), 95 deletions(-) diff --git a/A72-open-telemetry-tracing.md b/A72-open-telemetry-tracing.md index 3a667e347..b1b32cde5 100644 --- a/A72-open-telemetry-tracing.md +++ b/A72-open-telemetry-tracing.md @@ -53,57 +53,32 @@ languages due to different underlying infrastructures. ### Java In Java, it will be part of global interceptors, so that the interceptors are -managed in a more sustainable way and user-friendly. As a prerequisite, the stream -tracer factory API will be stabilized. OpenTelemetryModule will be created with -an OpenTelemetry API instance passing in for necessary configurations. +managed in a more sustainable way and user-friendly. Currently `GrpcOpenTelemetry` is constructed +with OpenTelemetry API instance passing in for necessary configurations. Users can also rely on SDK autoconfig extension that configures the sdk object -through environmental variables or Java system properties, then pass the +through environment variables or Java system properties, then pass the obtained sdk object to gRPC. -The following methods will be added to OpenTelemetryModule. +There are no changes to the Java API. Users would configure `TraceProvider` to the +OpenTelemetry API instance for constructing `GrpcOpenTelemetry` like below. ```Java -// A module that hosts OpenTelemetry tracing/metrics infrastructures. Module -// implementation may change as OpenTelemetry evolves. -public class OpenTelemetryModule { - /** - * OpenTelemetry instance is used to configure metrics and traces settings. - * Build a module by passing in an OpenTelemetry instance. Then get interceptor - * and ServerStreamTracerFactory from the module and install to the GlobalInterceptors. - * - * Sample: - * Construct a TraceProvider that will be used to provide traces during instrumentation. - * - * SdkTracerProvider sdkTracerProvider = SdkTracerProvider.builder() - * .addSpanProcessor( - * BatchSpanProcessor.builder(exporter).build()) - * .build(); - * - * Construct OpenTelemetry to be passed to gRPC OpenTelemetry module for - * traces and metrics configurations. - * - * OpenTelemetry openTelemetry = OpenTelemetrySdk.builder() - * .setTracerProvider(sdkTracerProvider) - * .setMeterProvider(...) - * .setPropagators(ContextPropagators.create(W3CTraceContextPropagator.getInstance())) - * .build(); - * OpenTelemetryModule otModule = OpenTelemetryModule.newBuilder().sdk(openTelemetry).build(); - * - * - * Add interceptors and StreamTracerFactory obtained from the module to GlobalInterceptors. - * - * GlobalInterceptors.setInterceptors( - * Arrays.asList(otModule.getClientTracingInterceptor()), - * Arrays.asList(otModule.getServerTracerFactory())); - */ - - // Return the client interceptor to be installed by users to generate trace information. - public ClientInterceptor getClientTracingInterceptor(); - - // Return the ServerStreamTracerFactory to be installed by users to generate trace information. - public ServerStreamTracerFactory getServerTracerFactory(); -} - +// Construct a TraceProvider that will be used to provide traces during instrumentation. +SdkTracerProvider sdkTracerProvider = SdkTracerProvider.builder() + .addSpanProcessor( + BatchSpanProcessor.builder(exporter).build()) + .build(); +// Construct OpenTelemetry to be passed to gRPC OpenTelemetry module for +// traces and metrics configurations. +OpenTelemetry openTelemetry = OpenTelemetrySdk.builder() + .setTracerProvider(sdkTracerProvider) + .setMeterProvider(...) + .setPropagators(ContextPropagators.create(W3CTraceContextPropagator.getInstance())) + .build(); +GrpcOpenTelemetry otModule = GrpcOpenTelemetry.newBuilder().sdk(openTelemetry).build(); + +// Add interceptors and StreamTracerFactory obtained from the module globally. +otModule.registerGlobal(); ``` ### C++ @@ -124,44 +99,23 @@ class OpenTelemetryPluginBuilder { ``` ### Go -The following methods will be added to the opentelemetry package. +The following `TraceOptions` field will be added to the `opentelemetry` Options struct. ```go import ( "go.opentelemetry.io/otel/trace" ) +type Options struct { + // Existing field in opentelemetry package + MetricsOptions MetricsOptions + TraceOptions TraceOptions +} + // TraceOptions are the trace options for OpenTelemetry instrumentation. type TraceOptions struct { TraceProvider trace.TraceProvider } -// DialOption returns a dial option which enables OpenTelemetry instrumentation -// code for a grpc.ClientConn. -// -// Client applications interested in instrumenting their grpc.ClientConn should -// pass the dial option returned from this function as a dial option to -// grpc.Dial(). -// -// Using this option will always lead to instrumentation, however in order to -// use the data a SpanExporter must be registered with the TraceProvider in the -// TraceOption. Client side has retries, so a Unary and Streaming Interceptor are -// registered to handle per RPC traces, and a Stats Handler is registered to handle -// per RPC attempt trace. These three components registered work together in -// conjunction, and do not work standalone. -func DialOption(to TraceOptions) grpc.DialOption {} - -// ServerOption returns a server option which enables OpenTelemetry -// instrumentation code for a grpc.Server. -// -// Server applications interested in instrumenting their grpc.Server should pass -// the server option returned from this function as an argument to -// grpc.NewServer(). -// -// Using this option will always lead to instrumentation, however in order to -// use the data a SpanExporter must be registered with the TraceProvider option. -// Server side does not have retries, so a registered Stats Handler is the only -// option that is returned. -func ServerOption(to TraceOptions) grpc.ServerOption {} ``` ## Tracing Information @@ -190,15 +144,14 @@ On attempt span: (it depends on implementation whether there is a single event or an additional separate event with name "Outbound message compressed" for compressed message size) with name "Outbound message sent" and the following attributes: - * key `message.event.type` with string value "SENT". - * key `message.message.id` with integer value of the seq no. The seq no. indicates + * key `sequence-number` with integer value of the seq no. The seq no. indicates the order of the sent messages on the attempt (i.e., it starts at 0 and is incremented by 1 for each message sent), the same below. - * key named `message.event.size.uncompressed` if the message needs compression, - otherwise named `message.event.size`, with integer value of uncompressed + * key named `message-size-uncompressed` if the message needs compression, + otherwise named `message-size`, with integer value of uncompressed message size. The size is the total attempt message bytes without encryption, not including grpc or transport framing bytes, the same below. - * If compression needed, add key `message.event.size.compressed` with integer + * If compression needed, add key `message-size-compressed` with integer value of compressed message size. If this is reported as a separate event in an implementation, the event name is "Outbound message compressed" and the order of the event must be after the first event that reports the message size. @@ -206,13 +159,12 @@ On attempt span: implementation whether there is a single event or an additional separate event with name "Inbound message uncompressed" for uncompressed message size) with name "Inbound message read" and the following attributes: - * key `message.event.type` with string value "RECEIVED". - * key `message.message.id` with integer value of the seq no. The seq no. indicates + * key `sequence-number` with integer value of the seq no. The seq no. indicates the order of the received messages on the attempt (i.e., it starts at 0 and is incremented by 1 for each message received), the same below. - * key named `message.event.size.compressed` if the message needs decompression, - otherwise named `message.event.size`, with integer value of wire message size. - * If the message needs decompression, add key `message.event.size.uncompressed` + * key named `message-size-compressed` if the message needs decompression, + otherwise named `message-size`, with integer value of wire message size. + * If the message needs decompression, add key `message-size-uncompressed` with integer value of uncompressed message size. If this is reported as a separate event in an implementation, the event name is "Inbound message uncompressed" and the order of the event must be after the first event that reports the wire message size. @@ -223,12 +175,11 @@ At the server: (it depends on implementation whether there is a single event or an additional separate event with name "Outbound message compressed" for compressed message size) with name "Outbound message sent" and the following attributes: - * key `message.event.type` with string value "SENT". - * key `message.message.id` with integer value of the seq no. - * key named `message.event.size.uncompressed` if the message needs compression, - otherwise named `message.event.size`, with integer value of uncompressed + * key `sequence-number` with integer value of the seq no. + * key named `message-size-uncompressed` if the message needs compression, + otherwise named `message-size`, with integer value of uncompressed message size. - * If compression needed, add key `message.event.size.compressed` with integer + * If compression needed, add key `message-size-compressed` with integer value of compressed message size. If this is reported as a separate event in an implementation, the event name is "Outbound message compressed" and the order of the event must be after the first event that reports the message size. @@ -236,11 +187,10 @@ At the server: implementation whether there is a single event or an additional separate event with name "Inbound message uncompressed" for uncompressed message size) with name "Inbound message read" and the following attributes: - * key `message.event.type` with string value "RECEIVED". - * key `message.message.id` with integer value of the seq no. - * key named `message.event.size.compressed` if the message needs decompression, - otherwise named `message.event.size`, with integer value of wire message size. - * If the message needs decompression, add key `message.event.size.uncompressed` + * key `sequence-number` with integer value of the seq no. + * key named `message-size-compressed` if the message needs decompression, + otherwise named `message-size`, with integer value of wire message size. + * If the message needs decompression, add key `message-size-uncompressed` with integer value of uncompressed message size. If this is reported as a separate event in an implementation, the event name is "Inbound message uncompressed" and the order of the event must be after the first event that reports the wire message size. @@ -656,4 +606,4 @@ Will be implemented in Java, C++, Go and Python. [L29]: L29-cpp-opencensus-filter.md [A45]: A45-retry-stats.md -[A66]: A66-otel-stats.md \ No newline at end of file +[A66]: A66-otel-stats.md From a19961850afcb0eb80c2a4feb30a42ec273dc86f Mon Sep 17 00:00:00 2001 From: yifeizhuang Date: Wed, 5 Jun 2024 12:11:32 -0700 Subject: [PATCH 26/30] implementation may not know it needs compression whe reporting the first mesage size even when it does need compression --- A72-open-telemetry-tracing.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/A72-open-telemetry-tracing.md b/A72-open-telemetry-tracing.md index d77d97d8e..2b5bc4f3e 100644 --- a/A72-open-telemetry-tracing.md +++ b/A72-open-telemetry-tracing.md @@ -167,7 +167,7 @@ On attempt span: * key `sequence-number` with integer value of the seq no. The seq no. indicates the order of the sent messages on the attempt (i.e., it starts at 0 and is incremented by 1 for each message sent), the same below. - * key named `message-size-uncompressed` if the message needs compression, + * key named `message-size-uncompressed` if the implementation knows the message needs compression, otherwise named `message-size`, with integer value of uncompressed message size. The size is the total attempt message bytes without encryption, not including grpc or transport framing bytes, the same below. @@ -182,7 +182,7 @@ On attempt span: * key `sequence-number` with integer value of the seq no. The seq no. indicates the order of the received messages on the attempt (i.e., it starts at 0 and is incremented by 1 for each message received), the same below. - * key named `message-size-compressed` if the message needs decompression, + * key named `message-size-compressed` if the implementation knows the message needs decompression, otherwise named `message-size`, with integer value of wire message size. * If the message needs decompression, add key `message-size-uncompressed` with integer value of uncompressed message size. If this is reported as a From 8afa3173b673ea3b930336f2ef40a4d650b36f53 Mon Sep 17 00:00:00 2001 From: yifeizhuang Date: Mon, 17 Jun 2024 15:44:39 -0700 Subject: [PATCH 27/30] improve compression message name and add example --- A72-open-telemetry-tracing.md | 89 ++++++++++++++++++++++++++--------- 1 file changed, 67 insertions(+), 22 deletions(-) diff --git a/A72-open-telemetry-tracing.md b/A72-open-telemetry-tracing.md index 2b5bc4f3e..92461872a 100644 --- a/A72-open-telemetry-tracing.md +++ b/A72-open-telemetry-tracing.md @@ -152,6 +152,7 @@ At the client, on parent span: * When the call is closed, set RPC status and end the parent span. gRPC status "OK" is recorded with status "OK", while other gRPC statuses are marked as "ERROR". Non-"OK" statuses include their code as a description, the same below. + For example, a span status description might be "UNAVAILABLE, unable to resolve host", On attempt span: * When span is created, set the attribute with key `previous-rpc-attempts` and an @@ -167,9 +168,9 @@ On attempt span: * key `sequence-number` with integer value of the seq no. The seq no. indicates the order of the sent messages on the attempt (i.e., it starts at 0 and is incremented by 1 for each message sent), the same below. - * key named `message-size-uncompressed` if the implementation knows the message needs compression, - otherwise named `message-size`, with integer value of uncompressed - message size. The size is the total attempt message bytes without encryption, + * key named `message-size`, with integer value of message size, + or uncompressed message size if message needs a compression. + The size is the total attempt message bytes without encryption, not including grpc or transport framing bytes, the same below. * If compression needed, add key `message-size-compressed` with integer value of compressed message size. If this is reported as a separate event in @@ -177,17 +178,17 @@ On attempt span: order of the event must be after the first event that reports the message size. * When an inbound message has been received, add Event(s) (it depends on implementation whether there is a single event or an additional separate event - with name "Inbound message uncompressed" for uncompressed message size) with name - "Inbound message read" and the following attributes: + with name "Inbound compressed message" for compressed message size) with name + "Inbound message received" and the following attributes: * key `sequence-number` with integer value of the seq no. The seq no. indicates the order of the received messages on the attempt (i.e., it starts at 0 and is incremented by 1 for each message received), the same below. - * key named `message-size-compressed` if the implementation knows the message needs decompression, - otherwise named `message-size`, with integer value of wire message size. - * If the message needs decompression, add key `message-size-uncompressed` - with integer value of uncompressed message size. If this is reported as a - separate event in an implementation, the event name is "Inbound message uncompressed" - and the order of the event must be after the first event that reports the wire message size. + * key named `message-size` with integer value of wire message size, or decompressed + message size of the message needs decompression. + * If the message needs decompression, add key `message-size-compressed` + with integer value of compressed message size. If this is reported as a + separate event in an implementation, the event name is "Inbound compressed message" + and the order of the event must be before the first event that reports the wire message size. * When the stream is closed, set RPC status and end the attempt span. At the server: @@ -196,26 +197,70 @@ At the server: separate event with name "Outbound message compressed" for compressed message size) with name "Outbound message sent" and the following attributes: * key `sequence-number` with integer value of the seq no. - * key named `message-size-uncompressed` if the message needs compression, - otherwise named `message-size`, with integer value of uncompressed - message size. + * key named `message-size`, with integer value of message size, + or uncompressed message size if message needs a compression. * If compression needed, add key `message-size-compressed` with integer value of compressed message size. If this is reported as a separate event in an implementation, the event name is "Outbound message compressed" and the order of the event must be after the first event that reports the message size. * When an inbound message has been received, add Event(s) (it depends on implementation whether there is a single event or an additional separate event - with name "Inbound message uncompressed" for uncompressed message size) with name - "Inbound message read" and the following attributes: + with name "Inbound compressed message" for compressed message size) with name + "Inbound message received" and the following attributes: * key `sequence-number` with integer value of the seq no. - * key named `message-size-compressed` if the message needs decompression, - otherwise named `message-size`, with integer value of wire message size. - * If the message needs decompression, add key `message-size-uncompressed` - with integer value of uncompressed message size. If this is reported as a - separate event in an implementation, the event name is "Inbound message uncompressed" - and the order of the event must be after the first event that reports the wire message size. + * key named `message-size` with integer value of wire message size, or decompressed + message size of the message needs decompression. + * If the message needs decompression, add key `message-size-compressed` + with integer value of compressed message size. If this is reported as a + separate event in an implementation, the event name is "Inbound compressed message" + and the order of the event must be before the first event that reports the wire message size. * When the stream is closed, set the RPC status and end the span. +A few examples of what message events (w/ and w/o message compression) look like in different implementations: +```agsl +An example trace with message compression (Java): + +Sending: +|-- Event 'Outbound message sent', attributes('sequence-numer' = 0, 'message-size' = 7854, 'message-size-compressed' = 5493) ----| + +Receiving: +|-- Event 'Inbound compressed message', attributes('sequence-numer' = 0, 'message-size-compressed' = 5493 ) ----| +|-- Event 'Inbound message received', attributes('message-size' = 7854) ----| +``` + +```agsl +An example trace with message compression (Go): + +Sending: +|-- Event 'Outbound message sent', attributes('sequence-numer' = 0, 'message-size' = 7854, 'message-size-compressed' = 5493) ----| + +Receiving: +|-- Event 'Inbound message received', attributes('sequence-numer' = 0, 'message-size' = 7854, 'message-size-compressed' = 5493) ----| +``` + +```agsl +An example trace with message compression (C++): + +Sending: +|-- Event 'Outbound message sent', attributes('sequence-numer' = 0, 'message-size' = 7854) ----| +|-- Event 'Outbound message compressed', attributes('message-size-compressed' = 5493) ----| + +Receiving: +|-- Event 'Inbound compressed message', attributes('message-size-compressed' = 5493 ) ----| +|-- Event 'Inbound message received', attributes('message-size' = 7854) ----| + +``` + +```agsl +An example trace with no message compression (Java/Go/C++): + +Sending: +|-- Event 'Outbound message sent', attributes('sequence-numer' = 0, 'message-size' = 7854) ----| + +Receiving: +|-- Event 'Inbound message received', attributes('sequence-numer' = 0, 'message-size' = 7854) ----| +``` + ### Limitations The timestamp information on the Events that report compressed/uncompressed message sizes are not accurate or useful. It only gives you a relative order with other Events. From 78f3d71539c6582e6b34a2570d0a54f799625f29 Mon Sep 17 00:00:00 2001 From: yifeizhuang Date: Mon, 17 Jun 2024 15:55:00 -0700 Subject: [PATCH 28/30] fix typo --- A72-open-telemetry-tracing.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/A72-open-telemetry-tracing.md b/A72-open-telemetry-tracing.md index 92461872a..5b41b0207 100644 --- a/A72-open-telemetry-tracing.md +++ b/A72-open-telemetry-tracing.md @@ -152,7 +152,7 @@ At the client, on parent span: * When the call is closed, set RPC status and end the parent span. gRPC status "OK" is recorded with status "OK", while other gRPC statuses are marked as "ERROR". Non-"OK" statuses include their code as a description, the same below. - For example, a span status description might be "UNAVAILABLE, unable to resolve host", + For example, a span status description might be "UNAVAILABLE, unable to resolve host". On attempt span: * When span is created, set the attribute with key `previous-rpc-attempts` and an @@ -169,7 +169,7 @@ On attempt span: the order of the sent messages on the attempt (i.e., it starts at 0 and is incremented by 1 for each message sent), the same below. * key named `message-size`, with integer value of message size, - or uncompressed message size if message needs a compression. + or uncompressed message size if message needs compression. The size is the total attempt message bytes without encryption, not including grpc or transport framing bytes, the same below. * If compression needed, add key `message-size-compressed` with integer @@ -198,7 +198,7 @@ At the server: with name "Outbound message sent" and the following attributes: * key `sequence-number` with integer value of the seq no. * key named `message-size`, with integer value of message size, - or uncompressed message size if message needs a compression. + or uncompressed message size if message needs compression. * If compression needed, add key `message-size-compressed` with integer value of compressed message size. If this is reported as a separate event in an implementation, the event name is "Outbound message compressed" and the From 9a219d3afb67ec28659c43d158999dc1f47378b5 Mon Sep 17 00:00:00 2001 From: yifeizhuang Date: Tue, 18 Jun 2024 15:22:32 -0700 Subject: [PATCH 29/30] refer to event name, update oc/ot field name mapping, and fix typo --- A72-open-telemetry-tracing.md | 28 ++++++++++++++-------------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/A72-open-telemetry-tracing.md b/A72-open-telemetry-tracing.md index 5b41b0207..95cc02732 100644 --- a/A72-open-telemetry-tracing.md +++ b/A72-open-telemetry-tracing.md @@ -175,7 +175,7 @@ On attempt span: * If compression needed, add key `message-size-compressed` with integer value of compressed message size. If this is reported as a separate event in an implementation, the event name is "Outbound message compressed" and the - order of the event must be after the first event that reports the message size. + order of the event must be after the "Outbound message sent" event. * When an inbound message has been received, add Event(s) (it depends on implementation whether there is a single event or an additional separate event with name "Inbound compressed message" for compressed message size) with name @@ -184,11 +184,11 @@ On attempt span: the order of the received messages on the attempt (i.e., it starts at 0 and is incremented by 1 for each message received), the same below. * key named `message-size` with integer value of wire message size, or decompressed - message size of the message needs decompression. + message size if the message needs decompression. * If the message needs decompression, add key `message-size-compressed` with integer value of compressed message size. If this is reported as a separate event in an implementation, the event name is "Inbound compressed message" - and the order of the event must be before the first event that reports the wire message size. + and the order of the event must be before the "Inbound message received" event. * When the stream is closed, set RPC status and end the attempt span. At the server: @@ -202,18 +202,18 @@ At the server: * If compression needed, add key `message-size-compressed` with integer value of compressed message size. If this is reported as a separate event in an implementation, the event name is "Outbound message compressed" and the - order of the event must be after the first event that reports the message size. + order of the event must be after the "Outbound message sent" event. * When an inbound message has been received, add Event(s) (it depends on implementation whether there is a single event or an additional separate event with name "Inbound compressed message" for compressed message size) with name "Inbound message received" and the following attributes: * key `sequence-number` with integer value of the seq no. * key named `message-size` with integer value of wire message size, or decompressed - message size of the message needs decompression. + message size if the message needs decompression. * If the message needs decompression, add key `message-size-compressed` with integer value of compressed message size. If this is reported as a separate event in an implementation, the event name is "Inbound compressed message" - and the order of the event must be before the first event that reports the wire message size. + and the order of the event must be before the "Inbound message received" event. * When the stream is closed, set the RPC status and end the span. A few examples of what message events (w/ and w/o message compression) look like in different implementations: @@ -246,7 +246,7 @@ Sending: |-- Event 'Outbound message compressed', attributes('message-size-compressed' = 5493) ----| Receiving: -|-- Event 'Inbound compressed message', attributes('message-size-compressed' = 5493 ) ----| +|-- Event 'Inbound compressed message', attributes('sequence-numer' = 0, 'message-size-compressed' = 5493 ) ----| |-- Event 'Inbound message received', attributes('message-size' = 7854) ----| ``` @@ -630,23 +630,23 @@ gRPC is generating similar tracing information for OpenTelemetry compared with O but due to API differences between those two libraries, the trace information is represented slightly differently. In the new OpenTelemetry plugin, the client will add `Event`s (name: -`Outbound message sent` and `Inbound message read`) with corresponding attributes, +`Outbound message sent` and `Inbound message received`) with corresponding attributes, mapped from OpenCensus `MessageEvent` fields: | OpenCensus Trace Message Event Fields | OpenTelemetry Trace Event Attribute Key | |---------------------------------------|-----------------------------------------| -| `Type` | `message.event.type` | -| `Message Id` | `message.message.id` | -| `Uncompressed message size` | `message.event.size.uncompressed` | -| `Compressed message size` | `message.event.size.compressed` | +| `Type` | NA | +| `Message Id` | `sequence-number` | +| `Uncompressed message size` | `message-size` | +| `Compressed message size` | `message-size-compressed` | OpenCensus span annotation description maps to OpenTelemetry event name, and annotation attributes keys are mapped to event attributes keys: | OpenCensus Trace Annotation Attribute Key | OpenTelemetry Trace Event Attribute Key | |-------------------------------------------|-----------------------------------------| -| `type` | `message.event.type` | -| `id` | `message.message.id` | +| `type` | NA | +| `id` | `sequence-number` | ## Rationale C++ will not have the optimization path in its `GrpcTraceBinPropagator` API. We From 06ec130598d39a953602a3c9ce6df81d5bd6d5a3 Mon Sep 17 00:00:00 2001 From: yifeizhuang Date: Thu, 20 Jun 2024 13:35:45 -0700 Subject: [PATCH 30/30] move use of propagator before c++ example section --- A72-open-telemetry-tracing.md | 30 ++++++++++++++++-------------- 1 file changed, 16 insertions(+), 14 deletions(-) diff --git a/A72-open-telemetry-tracing.md b/A72-open-telemetry-tracing.md index 95cc02732..e43f5d1af 100644 --- a/A72-open-telemetry-tracing.md +++ b/A72-open-telemetry-tracing.md @@ -338,6 +338,22 @@ A `grpc-trace-bin` formatter implementation for OpenTelemetry is needed in each language, which can be similar to the OpenCensus implementation. Go already has community support for that. +With gRPC OpenTelemetry API, users can provide a single composite propagator that +combines one or multiple `TextMapPropagator` for their client and server separately. +OpenTelemetry and its extension packages support multiple text map propagators. +gRPC puts all the propagator data into the wire through metadata, and receives all the +data specified from the propagator configuration. +Users can define their own migration path for context propagators in distributed components. +Configuring gRPC OpenTelemetry with this propagator when dealing with +cross-process concerns during migration is straightforward and recommended. +In the long term, community standardized propagators, e.g. W3C is more encouraged than `GrpcTraceBinPropagator`. +This also allows users to easily migrate a group of applications with an old propagator to +a new propagator. An example migration path can be: +1. Configure server to accept both old and new propagators. +2. Configure the client with the desired new propagators and to drop the old propagator. +3. Make the server only accept the new propagators and complete the migration. + + #### GrpcTraceBinPropagator and TextMapGetter/Setter in Java/Go The pseudocode below demonstrates `GrpcTraceBinPropagator` and the corresponding gRPC Getter/Setter with an optimization path. @@ -572,20 +588,6 @@ MakeGrpcTraceBinTextMapPropagator() { ``` -With gRPC OpenTelemetry API, users can provide a single composite propagator that -combines one or multiple `TextMapPropagator` for their client and server separately. -OpenTelemetry and its extension packages support multiple text map propagators. -gRPC puts all the propagator data into the wire through metadata, and receives all the -data specified from the propagator configuration. -Users can define their own migration path for context propagators in distributed components. -Configuring gRPC OpenTelemetry with this propagator when dealing with -cross-process concerns during migration is straightforward and recommended. -In the long term, community standardized propagators, e.g. W3C is more encouraged than `GrpcTraceBinPropagator`. -This also allows users to easily migrate a group of applications with an old propagator to -a new propagator. An example migration path can be: -1. Configure server to accept both old and new propagators. -2. Configure the client with the desired new propagators and to drop the old propagator. -3. Make the server only accept the new propagators and complete the migration. ### Migration to OpenTelemetry: In Binary The OpenCensus [shim](https://github.com/open-telemetry/opentelemetry-java/tree/main/opencensus-shim)