Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenTelemetry TraceIdRatioBased sampler requirements following OTEP 235 #4166

Closed
wants to merge 36 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
0524a3d
OpenTelemetry trace SDK requirements for probability sampling followi…
jmacd Jul 26, 2024
c5453f8
linebreaks
jmacd Jul 30, 2024
25a61fd
Merge branch 'main' of github.com:open-telemetry/opentelemetry-specif…
jmacd Aug 7, 2024
68fa270
Add a migration section
jmacd Aug 7, 2024
51f9794
Merge branch 'main' of github.com:open-telemetry/opentelemetry-specif…
jmacd Aug 15, 2024
ba5a47b
lowercase hex
jmacd Aug 15, 2024
49673b7
spec-compliance-matrix.md
jmacd Aug 15, 2024
e51bea6
merge w/ removed file
jmacd Aug 15, 2024
4afe1c7
chlog
jmacd Aug 15, 2024
2f0dc0b
reverse inequality
jmacd Aug 29, 2024
f333b71
Apply suggestions from code review
jmacd Aug 29, 2024
b7376bd
remove sci-note and reverse region
jmacd Aug 29, 2024
483b3fa
Merge branch 'main' of github.com:open-telemetry/opentelemetry-specif…
jmacd Aug 29, 2024
c40de50
Merge branch 'main' of github.com:open-telemetry/opentelemetry-specif…
jmacd Sep 12, 2024
15a9c6f
spec-compliance: AlwaysOn too
jmacd Sep 12, 2024
672fac2
edits for jpkrohling
jmacd Sep 25, 2024
3c80d97
Apply suggestions from code review
jmacd Sep 25, 2024
b2b37f7
Merge branch 'jmacd/otep235' of github.com:jmacd/opentelemetry-specif…
jmacd Sep 25, 2024
1bb0b31
algorithm
jmacd Sep 27, 2024
2f0e387
move a sentence; drop a paragraph
jmacd Sep 27, 2024
6e29b0e
more overview
jmacd Oct 4, 2024
77b51f8
nuance
jmacd Oct 4, 2024
a61fbdd
Update specification/trace/tracestate-probability-sampling.md
jmacd Oct 4, 2024
59c329d
Merge branch 'main' of github.com:open-telemetry/opentelemetry-specif…
jmacd Oct 10, 2024
d21f341
Merge branch 'jmacd/otep235' of github.com:jmacd/opentelemetry-specif…
jmacd Oct 10, 2024
4e05267
Apply suggestions from code review
jmacd Oct 16, 2024
d65ea09
Merge branch 'main' of github.com:open-telemetry/opentelemetry-specif…
jmacd Oct 21, 2024
92876f9
Use consistent terminology with 4162, e.g., OpenTelemetry TraceState …
jmacd Oct 21, 2024
1855839
Specify a compatibility warning for transition
jmacd Oct 21, 2024
44c8190
asymmetrical
jmacd Oct 21, 2024
66d190f
TOC
jmacd Oct 21, 2024
0aacc19
Merge branch 'main' of github.com:open-telemetry/opentelemetry-specif…
jmacd Oct 30, 2024
e6dc409
AlwaysOn should respect sampling threshold
jmacd Oct 30, 2024
c75a010
Revert "AlwaysOn should respect sampling threshold"
jmacd Nov 1, 2024
87fb314
Merge branch 'main' of github.com:open-telemetry/opentelemetry-specif…
jmacd Nov 13, 2024
f3693fc
do not change AlwaysOnSampler spec
jmacd Nov 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,10 @@ release.

### Traces

- Define sampling threshold field in OpenTelemetry TraceState; define the behavior
of TraceIdRatioBased sampler in terms of W3C Trace Context Level 2 randomness.
([#4166](https://github.com/open-telemetry/opentelemetry-specification/pull/4166))

### Metrics

### Logs
Expand Down
1 change: 1 addition & 0 deletions spec-compliance-matrix.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,7 @@ formats is required. Implementing more than one format is optional.
| [Built-in `SpanProcessor`s implement `ForceFlush` spec](specification/trace/sdk.md#forceflush-1) | | | + | | + | + | + | + | + | + | + | |
| [Attribute Limits](specification/common/README.md#attribute-limits) | X | | + | | + | + | + | + | | | | |
| Fetch InstrumentationScope from ReadableSpan | | | + | | + | | | + | | | | |
| TraceIdRatioBased sampler implements OpenTelemetry tracestate `th` field | | | | | | | | | | | | |

## Baggage

Expand Down
111 changes: 76 additions & 35 deletions specification/trace/sdk.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,10 @@ linkTitle: SDK
+ [AlwaysOn](#alwayson)
+ [AlwaysOff](#alwaysoff)
+ [TraceIdRatioBased](#traceidratiobased)
- [Requirements for `TraceIdRatioBased` sampler algorithm](#requirements-for-traceidratiobased-sampler-algorithm)
- [`TraceIdRatioBased` sampler configuration](#traceidratiobased-sampler-configuration)
- [`TraceIdRatioBased` sampler algorithm](#traceidratiobased-sampler-algorithm)
- [`TraceIdRatioBased` sampler description](#traceidratiobased-sampler-description)
- [`TraceIdRatioBased` sampler compatibility warning](#traceidratiobased-sampler-compatibility-warning)
+ [ParentBased](#parentbased)
+ [JaegerRemoteSampler](#jaegerremotesampler)
- [Span Limits](#span-limits)
Expand Down Expand Up @@ -386,40 +389,78 @@ The default sampler is `ParentBased(root=AlwaysOn)`.

#### TraceIdRatioBased
jmacd marked this conversation as resolved.
Show resolved Hide resolved

* The `TraceIdRatioBased` MUST ignore the parent `SampledFlag`. To respect the
parent `SampledFlag`, the `TraceIdRatioBased` should be used as a delegate of
the `ParentBased` sampler specified below.
* Description MUST return a string of the form `"TraceIdRatioBased{RATIO}"`
with `RATIO` replaced with the Sampler instance's trace sampling ratio
represented as a decimal number. The precision of the number SHOULD follow
implementation language standards and SHOULD be high enough to identify when
Samplers have different ratios. For example, if a TraceIdRatioBased Sampler
had a sampling ratio of 1 to every 10,000 spans it COULD return
`"TraceIdRatioBased{0.000100}"` as its description.

TODO: Add details about how the `TraceIdRatioBased` is implemented as a function
of the `TraceID`. [#1413](https://github.com/open-telemetry/opentelemetry-specification/issues/1413)

##### Requirements for `TraceIdRatioBased` sampler algorithm

* The sampling algorithm MUST be deterministic. A trace identified by a given
`TraceId` is sampled or not independent of language, time, etc. To achieve this,
implementations MUST use a deterministic hash of the `TraceId` when computing
the sampling decision. By ensuring this, running the sampler on any child `Span`
will produce the same decision.
* A `TraceIdRatioBased` sampler with a given sampling rate MUST also sample all
traces that any `TraceIdRatioBased` sampler with a lower sampling rate would
sample. This is important when a backend system may want to run with a higher
sampling rate than the frontend system, this way all frontend traces will
still be sampled and extra traces will be sampled on the backend only.
* **WARNING:** Since the exact algorithm is not specified yet (see TODO above),
there will probably be changes to it in any language SDK once it is, which
would break code that relies on the algorithm results.
Only the configuration and creation APIs can be considered stable.
It is recommended to use this sampler algorithm only for root spans
(in combination with [`ParentBased`](#parentbased)) because different language
SDKs or even different versions of the same language SDKs may produce inconsistent
results for the same input.
**Status**: [Development](../document-status.md)

The `TraceIdRatioBased` sampler implements simple, ratio-based probability sampling using randomness features specified in the [W3C Trace Context Level 2][W3CCONTEXTMAIN] Candidate Recommendation.
jmacd marked this conversation as resolved.
Show resolved Hide resolved
OpenTelemetry follows W3C Trace Context Level 2, which specifies 56 bits of randomness,
[specifying how to make consistent probability sampling decisions using 56 bits of randomness][CONSISTENTSAMPLING].

The `TraceIdRatioBased` sampler MUST ignore the parent `SampledFlag`.
For respecting the parent `SampledFlag`, see the `ParentBased` sampler specified below.
jmacd marked this conversation as resolved.
Show resolved Hide resolved

Note that the "ratio-based" part of this Sampler's name implies that
it makes a probability decision directly from the TraceID, even though
it was not originally specified in an exact way. In the present
specification, the Sampler decision is more nuanced: only a portion of
the identifier is used, after checking whether the OpenTelemetry
TraceState field contains an explicit trace randomness value.

[W3CCONTEXTMAIN]: https://www.w3.org/TR/trace-context-2

##### `TraceIdRatioBased` sampler configuration

The `TraceIdRatioBased` sampler is typically configured using a 32-bit or 64-bit floating point number to express the sampling ratio.
The minimum valid sampling ratio is `2^-56`, and the maximum valid sampling ratio is 1.0.
From an input sampling ratio, a rejection threshold value is calculated; see [consistent-probability sampler requirements][CONSISTENTSAMPLING] for details on converting sampling ratios into thresholds with variable precision.

[CONSISTENTSAMPLING]: ./tracestate-probability-sampling.md

##### `TraceIdRatioBased` sampler algorithm

Given a Sampler configured with a sampling threshold `T` and Context with randomness value `R` (typically, the 7 rightmost bytes of the trace ID), when `ShouldSample()` is called, it uses the expression `R >= T` to decide whether to return `RECORD_AND_SAMPLE` or `DROP`.

* If randomness value (R) is greater or equal to the rejection threshold (T), meaning when (R >= T), return `RECORD_AND_SAMPLE`, otherwise, return `DROP`.
* When (R >= T), the OpenTelemetry TraceState SHOULD be modified to include the key-value `th:T` for rejection threshold value (T), as specified for the [OpenTelemetry TraceState `th` sub-key][TRACESTATEHANDLING].

[TRACESTATEHANDLING]: ./tracestate-handling.md#sampling-threshold-value-th

##### `TraceIdRatioBased` sampler description

The `TraceIdRatioBased` GetDescription MUST return a string of the form `"TraceIdRatioBased{RATIO}"`
with `RATIO` replaced with the Sampler instance's trace sampling ratio
represented as a decimal number. The precision of the number SHOULD follow
implementation language standards and SHOULD be high enough to identify when
Samplers have different ratios. For example, if a TraceIdRatioBased Sampler
had a sampling ratio of 1 to every 10,000 spans it could return
`"TraceIdRatioBased{0.000100}"` as its description.
jmacd marked this conversation as resolved.
Show resolved Hide resolved

##### `TraceIdRatioBased` sampler compatibility warning

This specification has been revised from the original
`TraceIdRatioBased` Sampler definition. The present definition for
`TraceIdRatioBased` uses a new definition for trace randomness, where
unless an explicit trace randomness value is set in the OpenTelemetry
TraceState `rv` sub-key, Samplers are meant to presume that TraceIDs
contain the necessary 56 bits of randomness.

When a TraceIdRatioBased Sampler makes a decision for a non-root Span
based on TraceID randomness, there is a possibility that the TraceID
was in fact generated by an older SDK, unaware of this specification.
The Trace random flag lets us disambiguate these two cases. This flag
propagates information to let TraceIdRatioBased Samplers confirm that
TraceIDs are random, however this requires W3C Trace Context Level 2
to be supported by every Trace SDK that has handled the context.

When a TraceIdRatioBased Sampler makes a decision for a non-root Span
using TraceID randomness, but the Trace random flag was not set, the
SDK SHOULD issue a one-time warning statement in its log with a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: one-time warning - You may need to include criteria of what consistutes a "time", i.e. is it once per line of code, once per span name...

I think this is a bit too vague to implement consistently.

compatibility warning. As an example of this compatibility warning:

```
WARNING: The TraceIdRatioBased sampler is presuming TraceIDs are random
and expects the Trace random flag to be set in confirmation. Please
upgrade your caller(s) to use W3C Trace Context Level 2.
```

#### ParentBased

Expand Down
63 changes: 63 additions & 0 deletions specification/trace/tracestate-handling.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,21 @@ linkTitle: TraceState

**Status**: [Development](../document-status.md)

<details>
<summary>Table of Contents</summary>

<!-- toc -->

- [Key](#key)
- [Value](#value)
- [Setting values](#setting-values)
- [Pre-defined OpenTelemetry sub-keys](#pre-defined-opentelemetry-sub-keys)
* [Sampling threshold value `th`](#sampling-threshold-value-th)

<!-- tocstop -->

</details>

In alignment to the [TraceContext](https://www.w3.org/TR/trace-context/) specification, this section uses the
Augmented Backus-Naur Form (ABNF) notation of [RFC5234](https://www.w3.org/TR/trace-context/#bib-rfc5234),
including the DIGIT rule in that document.
Expand Down Expand Up @@ -83,3 +98,51 @@ if ok {
// traceState was not updated.
}
```

## Pre-defined OpenTelemetry sub-keys

The following values have been defined by OpenTelemetry.

### Sampling threshold value `th`

The OpenTelemetry TraceState `th` sub-key defines a sampling threshold, which conveys effective sampling probability.
Valid values of the `th` sub-fields include between 1 and 14 lowercase hexadecimal digits.

```
hexdigit = DIGIT ; a-f
```

To decode the threshold from the OpenTelemetry TraceState `th` value, first extend the value with trailing zeros to make 14 digits.
Then, parse the 14-digit value as a 56-bit unsigned hexadecimal number, yielding a rejection threshold.

OpenTelemetry defines consistent sampling in terms of a 56-bit trace randomness value compared with the 56-bit rejection threshold.
When the randomness value is less than the rejection threshold, the span is not sampled.

The threshold value `0` indicates that no spans are being rejected, corresponding with 100% sampling.
For example, the following TraceState value identifies a trace with 100% sampling:

```
tracestate: ot=th:0
```

To calculate sampling probability from the rejection threshold, define a constant `MaxAdjustedCount` equal to 2^56, the number of distinct 56-bit values.
The sampling probability is defined:

```
Probability = (MaxAdjustedCount - Threshold) / MaxAdjustedCount
```

Threshold can be calculated from Probability:

```
Threshold = MaxAdjustedCount * (1 - Probability)
```

jmacd marked this conversation as resolved.
Show resolved Hide resolved
In sampling, the term _adjusted count_ refers to the effective number of items represented by a sampled item of telemetry.
The adjusted count of a span is the inverse of its sampling probability and can be derived from the threshold as follows.

```
AdjustedCount = MaxAdjustedCount / (MaxAdjustedCount - Threshold)
```

As an example, 25% probability sampling corresponds with adjusted count 4 and threshold `c`.
Loading
Loading