From 564195301013ce04e9ca69699683a994ad166d96 Mon Sep 17 00:00:00 2001 From: Liudmila Molkova Date: Thu, 26 Aug 2021 11:04:10 -0700 Subject: [PATCH 01/11] Instrumentation layers and suppressing duplicates --- ...0189-instrumentation-layers-suppression.md | 127 ++++++++++++++++++ 1 file changed, 127 insertions(+) create mode 100644 text/trace/0189-instrumentation-layers-suppression.md diff --git a/text/trace/0189-instrumentation-layers-suppression.md b/text/trace/0189-instrumentation-layers-suppression.md new file mode 100644 index 000000000..fcf1f7457 --- /dev/null +++ b/text/trace/0189-instrumentation-layers-suppression.md @@ -0,0 +1,127 @@ +# Instrumentation Layers and Suppression + +This document describes approach for instrumentation layers, suppressing duplicate layers and unambiguously enriching spans. + +## Motivation + +- Provide clarity for instrumentation layers: e.g. DB calls on top of REST API +- Give mechanism to suppress instrumentation layers for the same type: e.g. multiple instrumented HTTP clients using each other. +- Give mechanism to enrich specific spans unambiguously: e.g. HTTP server span with routing information + +## Explanation + +### Spec changes proposal + +- Semantic Conventions: Each span MUST follow one convention, specific to the call it describes +- Trace API: Add `SpanKey` API that gets span following specific convention from the context (e.g. `SpanKey.HTTP_CLIENT.fromContextOrNull(context)`). +- Semantic Conventions: instrumentation MUST back off if span of same kind and following same contention is already in the context by using `ContextKey` API. +- Semantic Conventions: Client libraries instrumentation MUST make context current to enable correlation with underlying layers of instrumentation +- OTel SDK SHOULD allow suppression strategy configuration: + - suppress nested by kind (e.g. only one CLIENT allowed) + - suppress nested by kind + type (only one HTTP CLIENT allowed, but outer DB -> nested HTTP is ok) + +## Internal details + +Client libraries frequently use common protocols (HTTP, gRPC, DB drivers) to perform RPC calls, which are usually instrumented by OpenTelemetry. +At the same time, client library is rarely a thin client and may need its own instrumentation to + +- connect traces to application code +- provide extra context: + - duration of composite operations + - overall result of all operation + - any extra library-specific information not available on transport call span + +Both, client library 'logical' and transport 'physical' spans are useful. They also rarely can be combined together because they have 1:many relationship. + +So instrumentations form *layers*, where each layer follows specific convention (or describes certain library). + +*Example*: + +- HTTP SERVER span + - DB CLIENT call - 1 + - HTTP CLIENT call - 1 + - DNS CLIENT + - TLS CLIENT + - HTTP CLIENT call - 2 + +There are two HTTP client spans under DB call, they are children of DB client spans. DB spans follow DB semantics only, HTTP spans similarly only follow HTTP semantics. If there are other layers of instrumentation (TLS) - it happens under HTTP client spans. + +### Duplication problem + +Duplication is a common issue in auto-instrumentation: + +- e.g. HTTP clients frequently are built on top of other HTTP clients, making multiple layers of HTTP spans +- Libraries may decide to add native instrumentation for common protocols like HTTP or gRPC: + - to support legacy correlation protocols + - to make better decisions failures (e.g. 404, 409) + - give better library-specific context + - support users that can't or don't want to use auto-instrumentation + +So what happens in reality without attempts to suppress duplicates: + +- HTTP SERVER span (middleware) + - HTTP SERVER span (servlet) + - Controller INTERNAL span + - HTTP CLIENT call - 1 (Google HTTP client) + - HTTP CLIENT call - 1 (Apache HTTP client) + +#### Proposed solution + +Disallow multiple layers of the same instrumentation, i.e. above picture translates into: + +- HTTP SERVER span (middleware) + - Controller INTERNAL span + - HTTP CLIENT call - 1 (Google HTTP client) + +To do so, instrumentation: + +- checks if span with same kind + type is registered on context already + - yes: backs off, never starting a span + - no: starts a span and registers it on the context + +Registration is done by writing a span on the context under the key. For this to work between different instrumentations (native and auto), the API to access spans must be in Trace API. + +Same mechanism can be used by users/instrumentations to enrich spans, e.g. add route info to HTTP server span (current span is ambiguous) + +### Configuration + +Suppression strategy should be configurable: + +- backends don't always support nested CLIENT spans (extra hints needed for Application Map to show outbound connection) +- users may prefer to reduce verbosity and costs by suppressing spans of same kind + +So two strategies should be supported: + +- suppress all nested of same kind +- suppress all nested of same kind + type (default?) + +### Implementation + +https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/instrumentation-api/src/main/java/io/opentelemetry/instrumentation/api/instrumenter/SpanKey.java + +## Trade-offs and mitigations + +Trace API change is needed to support native library instrumentations - taking dependency on unstable experimental instrumentation API (or common contrib code) is not a good option. Instrumentation API is a good temporary place until we can put it in Trace API, native instrumentation can use reflection to access `SpanKey` in instrumentation API. + +## Prior art and alternatives + +- Terminal context: suppressing anything below +- Exposing spans stack and allowing to walk it accessing span properties +- Suppress all nested spans of same kind +- Make logical calls INTERNAL + +Discussions: + +- https://github.com/open-telemetry/opentelemetry-specification/issues/1767 +- https://github.com/open-telemetry/opentelemetry-java-instrumentation/issues/903 +- https://github.com/open-telemetry/opentelemetry-java-instrumentation/issues/465 +- https://github.com/open-telemetry/opentelemetry-java-instrumentation/issues/1822 +- https://github.com/open-telemetry/opentelemetry-specification/issues/526 +- https://github.com/open-telemetry/opentelemetry-python-contrib/issues/369 +- https://github.com/open-telemetry/opentelemetry-python-contrib/issues/445 +- https://github.com/open-telemetry/opentelemetry-python-contrib/issues/456 + +## Open questions + +- Backends need hint to separate logical CLIENT spans from physical ones +- Good default (suppress by kind or kind + type) From 0f7e2927819c83700a9aab09398a09b5c4218845 Mon Sep 17 00:00:00 2001 From: Liudmila Molkova Date: Thu, 26 Aug 2021 11:47:55 -0700 Subject: [PATCH 02/11] review --- ...0189-instrumentation-layers-suppression.md | 28 ++++++++++++++++++- 1 file changed, 27 insertions(+), 1 deletion(-) diff --git a/text/trace/0189-instrumentation-layers-suppression.md b/text/trace/0189-instrumentation-layers-suppression.md index fcf1f7457..e3800d517 100644 --- a/text/trace/0189-instrumentation-layers-suppression.md +++ b/text/trace/0189-instrumentation-layers-suppression.md @@ -18,7 +18,33 @@ This document describes approach for instrumentation layers, suppressing duplica - Semantic Conventions: Client libraries instrumentation MUST make context current to enable correlation with underlying layers of instrumentation - OTel SDK SHOULD allow suppression strategy configuration: - suppress nested by kind (e.g. only one CLIENT allowed) - - suppress nested by kind + type (only one HTTP CLIENT allowed, but outer DB -> nested HTTP is ok) + - suppress nested by kind + convention it follows (only one HTTP CLIENT allowed, but outer DB -> nested HTTP is ok) + +#### SpanKey API + +SpanKey allows to read/write span to context. +Defines known SpanKey, shared between instrumentations (static singletons): HTTP, RPC, DB, Messaging. + +#### Example + +- HTTP Client 1: + - Gets HTTP CLIENT span from context: `SpanKey.HTTP_CLIENT.fromContextOrNull(ctx)` + - No HTTP client span on the context: + - start span + - store span in context: `SpanKey.HTTP_CLIENT.storeInContext(ctx, span)` + - Make `ctx` current +- Http Client 2: + - Gets HTTP CLIENT span from context: `SpanKey.HTTP_CLIENT.fromContextOrNull(Context.current())` + - HTTP client span is already there: do not instrument + +Suppression logic is configurable and encapsulated in `SpanKey` - instrumentation should not depend on configuration, e.g.: + +- suppressing by kind only - context key does not distinguish conventions within kind + - `SpanKey.HTTP_CLIENT.fromContextOrNull` returns CLIENT span + - `SpanKey.HTTP_CLIENT.storeInContext` stores span in CLIENT span context key +- suppressing by kind + convention - context key is per convention and kind + - `SpanKey.HTTP_CLIENT.fromContextOrNull` returns HTTP CLIENT span + - `SpanKey.HTTP_CLIENT.storeInContext` stores span in CLIENT + convention span context key ## Internal details From 273b955238c6ab4d91cabbb97d2f2f10cdbeb845 Mon Sep 17 00:00:00 2001 From: Liudmila Molkova Date: Fri, 27 Aug 2021 14:41:25 -0700 Subject: [PATCH 03/11] review and lint --- ...0189-instrumentation-layers-suppression.md | 27 ++++++++++--------- 1 file changed, 14 insertions(+), 13 deletions(-) diff --git a/text/trace/0189-instrumentation-layers-suppression.md b/text/trace/0189-instrumentation-layers-suppression.md index e3800d517..1b8076884 100644 --- a/text/trace/0189-instrumentation-layers-suppression.md +++ b/text/trace/0189-instrumentation-layers-suppression.md @@ -12,11 +12,11 @@ This document describes approach for instrumentation layers, suppressing duplica ### Spec changes proposal -- Semantic Conventions: Each span MUST follow one convention, specific to the call it describes +- Semantic Conventions: Each span MUST follow at most one convention, specific to the call it describes. - Trace API: Add `SpanKey` API that gets span following specific convention from the context (e.g. `SpanKey.HTTP_CLIENT.fromContextOrNull(context)`). -- Semantic Conventions: instrumentation MUST back off if span of same kind and following same contention is already in the context by using `ContextKey` API. +- Semantic Conventions: instrumentation MUST back off if span of same kind and following same contention is already in the context by using `SpanKey` API. - Semantic Conventions: Client libraries instrumentation MUST make context current to enable correlation with underlying layers of instrumentation -- OTel SDK SHOULD allow suppression strategy configuration: +- OTel SDK SHOULD allow suppression strategy configuration - suppress nested by kind (e.g. only one CLIENT allowed) - suppress nested by kind + convention it follows (only one HTTP CLIENT allowed, but outer DB -> nested HTTP is ok) @@ -59,7 +59,7 @@ At the same time, client library is rarely a thin client and may need its own in Both, client library 'logical' and transport 'physical' spans are useful. They also rarely can be combined together because they have 1:many relationship. -So instrumentations form *layers*, where each layer follows specific convention (or describes certain library). +So instrumentations form *layers*, where each layer follows specific convention or no convention at all. Spans that are not convention-specific (generic manual spans, INTERNAL spans) are never suppressed. *Example*: @@ -123,7 +123,7 @@ So two strategies should be supported: ### Implementation -https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/instrumentation-api/src/main/java/io/opentelemetry/instrumentation/api/instrumenter/SpanKey.java +Here's [Instrumentation API in Java implementation](https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/instrumentation-api/src/main/java/io/opentelemetry/instrumentation/api/instrumenter/SpanKey.java) with suppression by type ## Trade-offs and mitigations @@ -138,16 +138,17 @@ Trace API change is needed to support native library instrumentations - taking d Discussions: -- https://github.com/open-telemetry/opentelemetry-specification/issues/1767 -- https://github.com/open-telemetry/opentelemetry-java-instrumentation/issues/903 -- https://github.com/open-telemetry/opentelemetry-java-instrumentation/issues/465 -- https://github.com/open-telemetry/opentelemetry-java-instrumentation/issues/1822 -- https://github.com/open-telemetry/opentelemetry-specification/issues/526 -- https://github.com/open-telemetry/opentelemetry-python-contrib/issues/369 -- https://github.com/open-telemetry/opentelemetry-python-contrib/issues/445 -- https://github.com/open-telemetry/opentelemetry-python-contrib/issues/456 +- [Client library + auto instrumentation](https://github.com/open-telemetry/opentelemetry-specification/issues/1767) +- [Prevent duplicate telemetry when using both library and auto instrumentation](https://github.com/open-telemetry/opentelemetry-java-instrumentation/issues/903) +- [Generic mechanism for preventing multiple Server/Client spans](https://github.com/open-telemetry/opentelemetry-java-instrumentation/issues/465) +- [Proposal for modeling nested CLIENT instrumentation](https://github.com/open-telemetry/opentelemetry-java-instrumentation/issues/1822) +- [SpanKind with layers](https://github.com/open-telemetry/opentelemetry-specification/issues/526) +- [Should instrumentations be able to interact with or know about other instrumentations](https://github.com/open-telemetry/opentelemetry-python-contrib/issues/369) +- [Server instrumentations should look for parent spans in current context before extracting context from carriers](https://github.com/open-telemetry/opentelemetry-python-contrib/issues/445) +- [CLIENT spans should update their parent span's kind to INTERNAL](https://github.com/open-telemetry/opentelemetry-python-contrib/issues/456) ## Open questions - Backends need hint to separate logical CLIENT spans from physical ones - Good default (suppress by kind or kind + type) +- Should we have configuration option to never suppress anything \ No newline at end of file From 8b81638f04a72be16d35d2ba17085711bc443291 Mon Sep 17 00:00:00 2001 From: Liudmila Molkova Date: Fri, 27 Aug 2021 14:43:18 -0700 Subject: [PATCH 04/11] trailing spaces --- text/trace/0189-instrumentation-layers-suppression.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/text/trace/0189-instrumentation-layers-suppression.md b/text/trace/0189-instrumentation-layers-suppression.md index 1b8076884..0a0a9091e 100644 --- a/text/trace/0189-instrumentation-layers-suppression.md +++ b/text/trace/0189-instrumentation-layers-suppression.md @@ -12,11 +12,11 @@ This document describes approach for instrumentation layers, suppressing duplica ### Spec changes proposal -- Semantic Conventions: Each span MUST follow at most one convention, specific to the call it describes. +- Semantic Conventions: Each span MUST follow at most one convention, specific to the call it describes. - Trace API: Add `SpanKey` API that gets span following specific convention from the context (e.g. `SpanKey.HTTP_CLIENT.fromContextOrNull(context)`). - Semantic Conventions: instrumentation MUST back off if span of same kind and following same contention is already in the context by using `SpanKey` API. - Semantic Conventions: Client libraries instrumentation MUST make context current to enable correlation with underlying layers of instrumentation -- OTel SDK SHOULD allow suppression strategy configuration +- OTel SDK SHOULD allow suppression strategy configuration - suppress nested by kind (e.g. only one CLIENT allowed) - suppress nested by kind + convention it follows (only one HTTP CLIENT allowed, but outer DB -> nested HTTP is ok) @@ -151,4 +151,4 @@ Discussions: - Backends need hint to separate logical CLIENT spans from physical ones - Good default (suppress by kind or kind + type) -- Should we have configuration option to never suppress anything \ No newline at end of file +- Should we have configuration option to never suppress anything From d655c9dfdcb258b1e49e34785162935f4d6b99ed Mon Sep 17 00:00:00 2001 From: Liudmila Molkova Date: Fri, 27 Aug 2021 14:49:00 -0700 Subject: [PATCH 05/11] type -> convention --- text/trace/0189-instrumentation-layers-suppression.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/text/trace/0189-instrumentation-layers-suppression.md b/text/trace/0189-instrumentation-layers-suppression.md index 0a0a9091e..4ffa3b1c2 100644 --- a/text/trace/0189-instrumentation-layers-suppression.md +++ b/text/trace/0189-instrumentation-layers-suppression.md @@ -5,7 +5,7 @@ This document describes approach for instrumentation layers, suppressing duplica ## Motivation - Provide clarity for instrumentation layers: e.g. DB calls on top of REST API -- Give mechanism to suppress instrumentation layers for the same type: e.g. multiple instrumented HTTP clients using each other. +- Give mechanism to suppress instrumentation layers for the same convention: e.g. multiple instrumented HTTP clients using each other. - Give mechanism to enrich specific spans unambiguously: e.g. HTTP server span with routing information ## Explanation @@ -101,7 +101,7 @@ Disallow multiple layers of the same instrumentation, i.e. above picture transla To do so, instrumentation: -- checks if span with same kind + type is registered on context already +- checks if span with same kind + convention is registered on context already - yes: backs off, never starting a span - no: starts a span and registers it on the context @@ -119,11 +119,11 @@ Suppression strategy should be configurable: So two strategies should be supported: - suppress all nested of same kind -- suppress all nested of same kind + type (default?) +- suppress all nested of same kind + convention (default?) ### Implementation -Here's [Instrumentation API in Java implementation](https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/instrumentation-api/src/main/java/io/opentelemetry/instrumentation/api/instrumenter/SpanKey.java) with suppression by type +Here's [Instrumentation API in Java implementation](https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/instrumentation-api/src/main/java/io/opentelemetry/instrumentation/api/instrumenter/SpanKey.java) with suppression by convention. ## Trade-offs and mitigations @@ -150,5 +150,5 @@ Discussions: ## Open questions - Backends need hint to separate logical CLIENT spans from physical ones -- Good default (suppress by kind or kind + type) +- Good default (suppress by kind or kind + convention) - Should we have configuration option to never suppress anything From 9adb92b39f55fa688debdde0913cbff4e48d0548 Mon Sep 17 00:00:00 2001 From: Liudmila Molkova Date: Mon, 30 Aug 2021 12:18:33 -0700 Subject: [PATCH 06/11] Add suppress none and exists API --- ...0189-instrumentation-layers-suppression.md | 41 +++++++++++++++---- 1 file changed, 32 insertions(+), 9 deletions(-) diff --git a/text/trace/0189-instrumentation-layers-suppression.md b/text/trace/0189-instrumentation-layers-suppression.md index 4ffa3b1c2..c0399727c 100644 --- a/text/trace/0189-instrumentation-layers-suppression.md +++ b/text/trace/0189-instrumentation-layers-suppression.md @@ -13,38 +13,60 @@ This document describes approach for instrumentation layers, suppressing duplica ### Spec changes proposal - Semantic Conventions: Each span MUST follow at most one convention, specific to the call it describes. -- Trace API: Add `SpanKey` API that gets span following specific convention from the context (e.g. `SpanKey.HTTP_CLIENT.fromContextOrNull(context)`). -- Semantic Conventions: instrumentation MUST back off if span of same kind and following same contention is already in the context by using `SpanKey` API. +- Trace API: Add `SpanKey` API that + - checks if similar span already exists on the context (e.g. `SpanKey.HTTP_CLIENT.exists(context)`) + - gets span following specific convention from the context (e.g. `SpanKey.HTTP_CLIENT.fromContextOrNull(context)`). +- Semantic Conventions: instrumentation MUST back off if span of same kind and following same contention is already exists on the context by using `SpanKey` API. - Semantic Conventions: Client libraries instrumentation MUST make context current to enable correlation with underlying layers of instrumentation - OTel SDK SHOULD allow suppression strategy configuration - suppress nested by kind (e.g. only one CLIENT allowed) - suppress nested by kind + convention it follows (only one HTTP CLIENT allowed, but outer DB -> nested HTTP is ok) + - suppress none #### SpanKey API -SpanKey allows to read/write span to context. -Defines known SpanKey, shared between instrumentations (static singletons): HTTP, RPC, DB, Messaging. +SpanKey allows to -#### Example +- read/write span to context +- check if specific span is on the context +- encapsulate different suppression strategies +- define known SpanKey, shared between instrumentations (static singletons): HTTP, RPC, DB, Messaging. + +#### Suppression Example - HTTP Client 1: - - Gets HTTP CLIENT span from context: `SpanKey.HTTP_CLIENT.fromContextOrNull(ctx)` + - Check if HTTP CLIENT span exists on the context: `SpanKey.HTTP_CLIENT.exists(ctx)` - No HTTP client span on the context: - start span - store span in context: `SpanKey.HTTP_CLIENT.storeInContext(ctx, span)` - Make `ctx` current - Http Client 2: - - Gets HTTP CLIENT span from context: `SpanKey.HTTP_CLIENT.fromContextOrNull(Context.current())` + - Checks if HTTP CLIENT span is already on the context: `SpanKey.HTTP_CLIENT.exists(Context.current())` - HTTP client span is already there: do not instrument -Suppression logic is configurable and encapsulated in `SpanKey` - instrumentation should not depend on configuration, e.g.: +Suppression logic is configurable and encapsulated in `SpanKey` - specific library instrumentation should not depend on configuration, e.g.: - suppressing by kind only - context key does not distinguish conventions within kind + - `SpanKey.HTTP_CLIENT.exists` returns true if any CLIENT span is on the context - `SpanKey.HTTP_CLIENT.fromContextOrNull` returns CLIENT span - `SpanKey.HTTP_CLIENT.storeInContext` stores span in CLIENT span context key - suppressing by kind + convention - context key is per convention and kind + - `SpanKey.HTTP_CLIENT.exists` returns true if HTTP CLIENT span is on the context - `SpanKey.HTTP_CLIENT.fromContextOrNull` returns HTTP CLIENT span - `SpanKey.HTTP_CLIENT.storeInContext` stores span in CLIENT + convention span context key +- suppressing none + - `SpanKey.HTTP_CLIENT.exists` returns false ignoring context + - `SpanKey.HTTP_CLIENT.fromContextOrNull` returns innermost HTTP CLIENT span on the context + - `SpanKey.HTTP_CLIENT.storeInContext` stores span in CLIENT + convention span context key + +#### Enrichment Example + +- HTTP SERVER 1 - middleware/servlet + - HTTP INTERNAL 1 - controller + - User code that wants to add event/attribute/status to HTTP SERVER span + - some internal instrumentation logic that sets route info status and exception ot anything else available after controller span starts. + +Assuming user code uses current context, it will get controller INTERNAL span. In order to enrich HTTP SERVER span, users may use `SpanKey.HTTP_SERVER.fromContextOrNull`. ## Internal details @@ -116,10 +138,11 @@ Suppression strategy should be configurable: - backends don't always support nested CLIENT spans (extra hints needed for Application Map to show outbound connection) - users may prefer to reduce verbosity and costs by suppressing spans of same kind -So two strategies should be supported: +So following strategies should be supported: - suppress all nested of same kind - suppress all nested of same kind + convention (default?) +- suppress none (mostly for debugging instrumentation code and internal observability) ### Implementation From 27d99cd61404c1b90db8968aaecdc5583b2cb60a Mon Sep 17 00:00:00 2001 From: Liudmila Molkova Date: Tue, 7 Sep 2021 13:40:35 -0700 Subject: [PATCH 07/11] conventions can include other conventions - allow that --- text/trace/0189-instrumentation-layers-suppression.md | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/text/trace/0189-instrumentation-layers-suppression.md b/text/trace/0189-instrumentation-layers-suppression.md index c0399727c..84e301970 100644 --- a/text/trace/0189-instrumentation-layers-suppression.md +++ b/text/trace/0189-instrumentation-layers-suppression.md @@ -1,6 +1,6 @@ # Instrumentation Layers and Suppression -This document describes approach for instrumentation layers, suppressing duplicate layers and unambiguously enriching spans. +This document describes approach for tracing instrumentation layers, suppressing duplicate layers and unambiguously enriching spans. ## Motivation @@ -12,17 +12,20 @@ This document describes approach for instrumentation layers, suppressing duplica ### Spec changes proposal -- Semantic Conventions: Each span MUST follow at most one convention, specific to the call it describes. +- Tracing Semantic Conventions: Each span MUST follow a single (besides general) convention, specific to the call it describes. - Trace API: Add `SpanKey` API that - checks if similar span already exists on the context (e.g. `SpanKey.HTTP_CLIENT.exists(context)`) - gets span following specific convention from the context (e.g. `SpanKey.HTTP_CLIENT.fromContextOrNull(context)`). -- Semantic Conventions: instrumentation MUST back off if span of same kind and following same contention is already exists on the context by using `SpanKey` API. -- Semantic Conventions: Client libraries instrumentation MUST make context current to enable correlation with underlying layers of instrumentation +- Tracing Semantic Conventions: instrumentation MUST back off if span of same kind and following same contention is already exists on the context by using `SpanKey` API. +- Tracing Semantic Conventions: Client libraries instrumentation MUST make context current to enable correlation with underlying layers of instrumentation - OTel SDK SHOULD allow suppression strategy configuration - suppress nested by kind (e.g. only one CLIENT allowed) - suppress nested by kind + convention it follows (only one HTTP CLIENT allowed, but outer DB -> nested HTTP is ok) - suppress none +Note: some conventions may explicitly include others (e.g. [FaaS](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/faas.md) may include [HTTP](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/http.md)), in this case for the purpose of this document, we assume span follows single convention. Instrumentation should only include explicitly mentioned sub-conventions (except general). +[General](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/span-general.md) convention attributes are allowed on all spans when applicable. + #### SpanKey API SpanKey allows to From ee85a3c8614e87bd9f2346e24ff6c9353d8bb8aa Mon Sep 17 00:00:00 2001 From: Liudmila Molkova Date: Tue, 7 Sep 2021 15:21:54 -1000 Subject: [PATCH 08/11] Update text/trace/0189-instrumentation-layers-suppression.md Co-authored-by: Johannes Tax --- text/trace/0189-instrumentation-layers-suppression.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/text/trace/0189-instrumentation-layers-suppression.md b/text/trace/0189-instrumentation-layers-suppression.md index 84e301970..9349e2b74 100644 --- a/text/trace/0189-instrumentation-layers-suppression.md +++ b/text/trace/0189-instrumentation-layers-suppression.md @@ -16,7 +16,8 @@ This document describes approach for tracing instrumentation layers, suppressing - Trace API: Add `SpanKey` API that - checks if similar span already exists on the context (e.g. `SpanKey.HTTP_CLIENT.exists(context)`) - gets span following specific convention from the context (e.g. `SpanKey.HTTP_CLIENT.fromContextOrNull(context)`). -- Tracing Semantic Conventions: instrumentation MUST back off if span of same kind and following same contention is already exists on the context by using `SpanKey` API. +- Tracing Semantic Conventions: instrumentation MUST back off if span of same kind and following same convention is already exists on the context by using `SpanKey` API. + - Tracing Semantic Conventions: Client libraries instrumentation MUST make context current to enable correlation with underlying layers of instrumentation - OTel SDK SHOULD allow suppression strategy configuration - suppress nested by kind (e.g. only one CLIENT allowed) From f365dbe0c198b7337a7da8b8b4130168b01aa9e2 Mon Sep 17 00:00:00 2001 From: Liudmila Molkova Date: Wed, 22 Sep 2021 13:18:17 -0700 Subject: [PATCH 09/11] Add InstrumenttionType and builder.setType(type) instead of SpanKey --- ...0189-instrumentation-layers-suppression.md | 157 +++++++++++------- 1 file changed, 99 insertions(+), 58 deletions(-) diff --git a/text/trace/0189-instrumentation-layers-suppression.md b/text/trace/0189-instrumentation-layers-suppression.md index 84e301970..032de3c9c 100644 --- a/text/trace/0189-instrumentation-layers-suppression.md +++ b/text/trace/0189-instrumentation-layers-suppression.md @@ -1,75 +1,61 @@ # Instrumentation Layers and Suppression -This document describes approach for tracing instrumentation layers, suppressing duplicate layers and unambiguously enriching spans. +This document describes approach for tracing instrumentation layers and suppressing duplicate layers. ## Motivation - Provide clarity for instrumentation layers: e.g. DB calls on top of REST API - Give mechanism to suppress instrumentation layers for the same convention: e.g. multiple instrumented HTTP clients using each other. -- Give mechanism to enrich specific spans unambiguously: e.g. HTTP server span with routing information ## Explanation ### Spec changes proposal -- Tracing Semantic Conventions: Each span MUST follow a single (besides general) convention, specific to the call it describes. -- Trace API: Add `SpanKey` API that - - checks if similar span already exists on the context (e.g. `SpanKey.HTTP_CLIENT.exists(context)`) - - gets span following specific convention from the context (e.g. `SpanKey.HTTP_CLIENT.fromContextOrNull(context)`). -- Tracing Semantic Conventions: instrumentation MUST back off if span of same kind and following same contention is already exists on the context by using `SpanKey` API. +- Tracing Semantic Conventions: Each span MUST follow a single (besides general and potentially composite), convention, specific to the call it describes. - Tracing Semantic Conventions: Client libraries instrumentation MUST make context current to enable correlation with underlying layers of instrumentation +- Trace API: Add `InstrumentationType` expendable enum with predefined values for known trace conventions (HTTP, RPC< DB, Messaging(see open questions)). *Type* is just a convention name. +- Trace SDK: Add span creation option to set `InstrumentationType` + - During span creation, checks if span should be suppressed (already on the parent `Context`) and returns a *suppressed* span, which is + - non-recording + - propagating (references parent context) + - does not become current (i.e. `makeCurrent` call with it is noop) - OTel SDK SHOULD allow suppression strategy configuration - suppress nested by kind (e.g. only one CLIENT allowed) - suppress nested by kind + convention it follows (only one HTTP CLIENT allowed, but outer DB -> nested HTTP is ok) - suppress none -Note: some conventions may explicitly include others (e.g. [FaaS](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/faas.md) may include [HTTP](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/http.md)), in this case for the purpose of this document, we assume span follows single convention. Instrumentation should only include explicitly mentioned sub-conventions (except general). -[General](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/span-general.md) convention attributes are allowed on all spans when applicable. - -#### SpanKey API - -SpanKey allows to - -- read/write span to context -- check if specific span is on the context -- encapsulate different suppression strategies -- define known SpanKey, shared between instrumentations (static singletons): HTTP, RPC, DB, Messaging. +Note: some conventions may explicitly include others (e.g. [FaaS](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/faas.md) may include [HTTP](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/http.md)). For the purpose of this document, we assume span follows single convention and instrumentation MUST include explicitly mentioned sub-conventions (except general) only. +[General](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/span-general.md) convention attributes are allowed on all spans when applicable, pure generic instrumentation are not suppressed. #### Suppression Example -- HTTP Client 1: - - Check if HTTP CLIENT span exists on the context: `SpanKey.HTTP_CLIENT.exists(ctx)` - - No HTTP client span on the context: - - start span - - store span in context: `SpanKey.HTTP_CLIENT.storeInContext(ctx, span)` - - Make `ctx` current -- Http Client 2: - - Checks if HTTP CLIENT span is already on the context: `SpanKey.HTTP_CLIENT.exists(Context.current())` - - HTTP client span is already there: do not instrument - -Suppression logic is configurable and encapsulated in `SpanKey` - specific library instrumentation should not depend on configuration, e.g.: - -- suppressing by kind only - context key does not distinguish conventions within kind - - `SpanKey.HTTP_CLIENT.exists` returns true if any CLIENT span is on the context - - `SpanKey.HTTP_CLIENT.fromContextOrNull` returns CLIENT span - - `SpanKey.HTTP_CLIENT.storeInContext` stores span in CLIENT span context key -- suppressing by kind + convention - context key is per convention and kind - - `SpanKey.HTTP_CLIENT.exists` returns true if HTTP CLIENT span is on the context - - `SpanKey.HTTP_CLIENT.fromContextOrNull` returns HTTP CLIENT span - - `SpanKey.HTTP_CLIENT.storeInContext` stores span in CLIENT + convention span context key +- Instrumentation - HTTP Client 1: + - starts span + - SDK checks if there is a span of the same kind + type on the parent context + - there is no similar span + - SDK returns new span + - stores it in the context (presumably current) + - SDK stores span in context behind `ContextKey` which is combination of kind and type (depending on configuration) +- Instrumentation - HTTP Client 2 (in scope of span1): + - [Optional suggestion]: add API to check if span should be started (optimization for power-users) + - starts span + - SDK checks if there is a span of the same kind + type on the parent context + - there is one already + - SDK returns suppressed (non-recording, propagating span, it's also never becomes current). + - HTTP client continues instrumentation with suppressed span + - adds attributes: noop + - injects context: duplicate work, since parent context is already injected and suppressed span carries the same context + - makes suppressed span current: current is not modified, scope is noop +- Instrumentation - TLS (imaginary, in scope of span2) + - works as if it's in scope of span1 + - current is span1 + - if explicit parent was used (span2), it carries span1 context + +Suppression logic is configurable and encapsulated in span builder SDK implementation - specific library instrumentation should not depend on configuration, e.g.: + +- suppressing by kind only - context key does not distinguish types within kind +- suppressing by kind + type - context key is per type and kind - suppressing none - - `SpanKey.HTTP_CLIENT.exists` returns false ignoring context - - `SpanKey.HTTP_CLIENT.fromContextOrNull` returns innermost HTTP CLIENT span on the context - - `SpanKey.HTTP_CLIENT.storeInContext` stores span in CLIENT + convention span context key - -#### Enrichment Example - -- HTTP SERVER 1 - middleware/servlet - - HTTP INTERNAL 1 - controller - - User code that wants to add event/attribute/status to HTTP SERVER span - - some internal instrumentation logic that sets route info status and exception ot anything else available after controller span starts. - -Assuming user code uses current context, it will get controller INTERNAL span. In order to enrich HTTP SERVER span, users may use `SpanKey.HTTP_SERVER.fromContextOrNull`. ## Internal details @@ -126,17 +112,17 @@ Disallow multiple layers of the same instrumentation, i.e. above picture transla To do so, instrumentation: -- checks if span with same kind + convention is registered on context already +- checks if span with same kind + type is registered on context already - yes: backs off, never starting a span - no: starts a span and registers it on the context Registration is done by writing a span on the context under the key. For this to work between different instrumentations (native and auto), the API to access spans must be in Trace API. -Same mechanism can be used by users/instrumentations to enrich spans, e.g. add route info to HTTP server span (current span is ambiguous) - ### Configuration -Suppression strategy should be configurable: +Suppression strategy should be configurable on SDK side - instrumentation does not need to know about it. + +Configuration is needed since: - backends don't always support nested CLIENT spans (extra hints needed for Application Map to show outbound connection) - users may prefer to reduce verbosity and costs by suppressing spans of same kind @@ -144,16 +130,61 @@ Suppression strategy should be configurable: So following strategies should be supported: - suppress all nested of same kind -- suppress all nested of same kind + convention (default?) +- suppress all nested of same kind + type (default?) - suppress none (mostly for debugging instrumentation code and internal observability) +#### Suppression examples: kind and type + +- HTTP SERVER + - HTTP CLIENT - ok + +- HTTP SERVER + - HTTP SERVER - suppressed + +- HTTP SERVER + - MESSAGING CONSUMER - ok // open questions around receive/process + +- MESSAGING PRODUCER // open questions around receive/process + - HTTP CLIENT - ok + +- MESSAGING CLIENT + - HTTP CLIENT - ok + +#### Suppression examples: kind + +- HTTP SERVER + - HTTP CLIENT - ok + +- HTTP SERVER + - HTTP SERVER - suppressed + +- HTTP SERVER + - MESSAGING CONSUMER - ok + +- MESSAGING PRODUCER // open questions around client/producer uncertainty + - HTTP CLIENT - ok + +- MESSAGING CLIENT + - HTTP CLIENT - suppressed + ### Implementation -Here's [Instrumentation API in Java implementation](https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/instrumentation-api/src/main/java/io/opentelemetry/instrumentation/api/instrumenter/SpanKey.java) with suppression by convention. +#### Trace API + +Proof of concept in Java: https://github.com/lmolkova/opentelemetry-java/pull/1 + +- introduces a new `SuppressedSpan` implementation. It's almost the same as `PropagatingSpan` (i.e. sampled-out), with the difference that `makeCurrent` is noop + - `PropagatingSpan` can't be reused since sampling out usually assumes to sample out this and all following downstream spans. Sampled out span becomes current and causes side-effects (span.current().setAttribute()) not compatible with suppression. +- adds extendable `InstrumentationType` and `SpanBuilder.setType(InstrumentationType type)` +- adds optional optimization `Tracer.shouldStartSpan(name, kind, type)` + +#### Instrumentation API + +[Instrumentation API in Java implementation](https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/instrumentation-api/src/main/java/io/opentelemetry/instrumentation/api/instrumenter/SpanKey.java) with suppression by type. ## Trade-offs and mitigations -Trace API change is needed to support native library instrumentations - taking dependency on unstable experimental instrumentation API (or common contrib code) is not a good option. Instrumentation API is a good temporary place until we can put it in Trace API, native instrumentation can use reflection to access `SpanKey` in instrumentation API. +Trace API change is needed to support native library instrumentations - taking dependency on unstable experimental instrumentation API (or common contrib code) is not a good option. ## Prior art and alternatives @@ -175,6 +206,16 @@ Discussions: ## Open questions +- Should we suppress by direction (inbound/outbound) instead of kind? Suggestion: let's fix current messaging quirks that cause concern here + - Messaging CONSUMER spans (CONSUMER *receive* is a parent of CONSUMER *process* and seem to violate [kind definition](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/api.md#spankind)) + - I'm going to drive changing it in messaging spec (*receive* is normal CLIENT span based on the ) + - Messaging PRODUCER/CLIENT uncertainty: it's changing, see [Creation and publishing](https://github.com/open-telemetry/oteps/pull/173#discussion_r704737276) + - Proposal: They are two different things + - Creation is PRODUCER + - Publish is CLIENT + +- Should we have `Tracer.shouldStart(spanName, kind, type, ?)` or `SpanBuilder.shouldStart()` methods to optimize instrumentation. If it's not called, everything works, just not too efficient + - Backends need hint to separate logical CLIENT spans from physical ones -- Good default (suppress by kind or kind + convention) +- Good default (suppress by kind or kind + type). Up to distro + user. - Should we have configuration option to never suppress anything From 75d51269e60904906a59dd18f231c13cc0d73ab1 Mon Sep 17 00:00:00 2001 From: Liudmila Molkova Date: Wed, 22 Sep 2021 13:43:34 -0700 Subject: [PATCH 10/11] lint --- text/trace/0189-instrumentation-layers-suppression.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/text/trace/0189-instrumentation-layers-suppression.md b/text/trace/0189-instrumentation-layers-suppression.md index 9783acdf1..4b117fe33 100644 --- a/text/trace/0189-instrumentation-layers-suppression.md +++ b/text/trace/0189-instrumentation-layers-suppression.md @@ -172,7 +172,7 @@ So following strategies should be supported: #### Trace API -Proof of concept in Java: https://github.com/lmolkova/opentelemetry-java/pull/1 +[Proof of concept in Java](https://github.com/lmolkova/opentelemetry-java/pull/1) - introduces a new `SuppressedSpan` implementation. It's almost the same as `PropagatingSpan` (i.e. sampled-out), with the difference that `makeCurrent` is noop - `PropagatingSpan` can't be reused since sampling out usually assumes to sample out this and all following downstream spans. Sampled out span becomes current and causes side-effects (span.current().setAttribute()) not compatible with suppression. @@ -183,7 +183,7 @@ Proof of concept in Java: https://github.com/lmolkova/opentelemetry-java/pull/1 [Instrumentation API in Java implementation](https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/instrumentation-api/src/main/java/io/opentelemetry/instrumentation/api/instrumenter/SpanKey.java) with suppression by type. -## Trade-offs and mitigations +## Trade-offs and mitigation Trace API change is needed to support native library instrumentations - taking dependency on unstable experimental instrumentation API (or common contrib code) is not a good option. From bd3eab0c819ad8f16ae9ae4c416ff49b150df2c9 Mon Sep 17 00:00:00 2001 From: Liudmila Molkova Date: Wed, 22 Sep 2021 14:01:34 -0700 Subject: [PATCH 11/11] minor --- text/trace/0189-instrumentation-layers-suppression.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/text/trace/0189-instrumentation-layers-suppression.md b/text/trace/0189-instrumentation-layers-suppression.md index 4b117fe33..862629184 100644 --- a/text/trace/0189-instrumentation-layers-suppression.md +++ b/text/trace/0189-instrumentation-layers-suppression.md @@ -11,14 +11,14 @@ This document describes approach for tracing instrumentation layers and suppress ### Spec changes proposal -- Tracing Semantic Conventions: Each span MUST follow a single (besides general and potentially composite), convention, specific to the call it describes. +- Tracing Semantic Conventions: Each span MUST follow a single (besides general and potentially composite), convention, specific to the call it describes. It MUST specify it when creating a span. - Tracing Semantic Conventions: Client libraries instrumentation MUST make context current to enable correlation with underlying layers of instrumentation - Trace API: Add `InstrumentationType` expandable enum with predefined values for known trace conventions (HTTP, RPC, DB, Messaging (see open questions)). *Type* is just a convention name. - Trace SDK: Add span creation option to set `InstrumentationType` - - During span creation, checks if span should be suppressed (there is another one for kind + type on the parent `Context`) and returns a *suppressed* span, which is + - During span creation, checks if span should be suppressed (if there is another one for kind + type on the parent `Context`) and if there is, returns a *suppressed* span, which is - non-recording - propagating (carries parent context) - - does not become current (i.e. `makeCurrent` call with it is noop) + - does not become current (i.e. `makeCurrent` call with it is no-op) - OTel SDK SHOULD allow suppression strategy configuration - suppress nested by kind (e.g. only one CLIENT allowed) - suppress nested by kind + convention it follows (only one HTTP CLIENT allowed, but outer DB -> nested HTTP is ok)