Skip to content
This repository has been archived by the owner on Dec 6, 2024. It is now read-only.

Versioning and Stability for OpenTelemetry Clients #143

Merged
merged 60 commits into from
Dec 16, 2020
Merged
Show file tree
Hide file tree
Changes from 38 commits
Commits
Show all changes
60 commits
Select commit Hold shift + click to select a range
f427d74
versioning and stability first draft
tedsuo Dec 4, 2020
50db1aa
whitespace
tedsuo Dec 4, 2020
a4d6432
update rfc number to match PR id
tedsuo Dec 4, 2020
a001ee6
Update text/0143-versioning-and-stability.md
tedsuo Dec 4, 2020
40d4dae
Update text/0143-versioning-and-stability.md
tedsuo Dec 4, 2020
a848e3a
Update text/0143-versioning-and-stability.md
tedsuo Dec 4, 2020
680d1ff
Update text/0143-versioning-and-stability.md
tedsuo Dec 5, 2020
f880b88
Update text/0143-versioning-and-stability.md
tedsuo Dec 7, 2020
70786fe
Update text/0143-versioning-and-stability.md
tedsuo Dec 7, 2020
b5b9255
Update text/0143-versioning-and-stability.md
tedsuo Dec 7, 2020
7a62f27
Update text/0143-versioning-and-stability.md
tedsuo Dec 8, 2020
7f24f1d
Update text/0143-versioning-and-stability.md
tedsuo Dec 8, 2020
bdb150a
remove vague design goal
tedsuo Dec 8, 2020
f818d11
Merge branch 'versioning' of github.com:tedsuo/rfcs into versioning
tedsuo Dec 8, 2020
d1624dc
Adding more info about cross-cutting concerns
tedsuo Dec 8, 2020
0ee804e
wtf git
tedsuo Dec 8, 2020
a3384e0
clarify deprecation version bump
tedsuo Dec 9, 2020
9669b23
clarify dependency conflicts
tedsuo Dec 9, 2020
48d7f4f
clarify stability
tedsuo Dec 9, 2020
0d03cf8
clarify contrib
tedsuo Dec 9, 2020
ccb0cc6
spelling
tedsuo Dec 9, 2020
f87ee1f
spelling
tedsuo Dec 9, 2020
2d52f5c
clarify experimental stage
tedsuo Dec 9, 2020
6f7fe0f
Update text/0143-versioning-and-stability.md
tedsuo Dec 9, 2020
55ddbfb
Update text/0143-versioning-and-stability.md
tedsuo Dec 9, 2020
9e0284b
remove double spaces
tedsuo Dec 9, 2020
53df880
clarify levels of stability
tedsuo Dec 9, 2020
a5e9192
seperate
tedsuo Dec 9, 2020
bacf18d
spelling
tedsuo Dec 9, 2020
1fbaf30
emphasize that the SDK should not be refernced
tedsuo Dec 9, 2020
c335a51
remove LTS
tedsuo Dec 9, 2020
02fa9c7
clarify what counts as a bug
tedsuo Dec 9, 2020
e90aec5
support -> retains
tedsuo Dec 9, 2020
d69fc1a
spelling
tedsuo Dec 11, 2020
4a13c0a
clarify that this proposal is about clients
tedsuo Dec 11, 2020
fe35fdb
Each component has it's own version
tedsuo Dec 11, 2020
f7a2c04
Added long term support
tedsuo Dec 11, 2020
4337fd8
Lint
tedsuo Dec 11, 2020
86cbe2b
Update text/0143-versioning-and-stability.md
tedsuo Dec 11, 2020
2f7396b
long term support infographic
tedsuo Dec 11, 2020
8e04de1
define OpenTelemetry GA
tedsuo Dec 11, 2020
6a2a78f
clarify SDK stability
tedsuo Dec 11, 2020
594e380
Clarify long term support for SDK and Contrib
tedsuo Dec 11, 2020
1da65ef
lint
tedsuo Dec 11, 2020
a71d8b5
Clarify that experimental packages should not move
tedsuo Dec 11, 2020
ca647dc
clarify which audience is allowed to interact with the SDK
tedsuo Dec 11, 2020
66e58e4
Make ABI a langauge specific concern
tedsuo Dec 11, 2020
eb39b89
clarify new major versions of existing signals
tedsuo Dec 11, 2020
00b57c7
clarify single API version
tedsuo Dec 11, 2020
a602397
clarify single SDK version number
tedsuo Dec 11, 2020
04a3f15
lint
tedsuo Dec 11, 2020
a4e027a
better pic
tedsuo Dec 12, 2020
8184d15
add ruby and js to 0.X experimental
tedsuo Dec 12, 2020
e560725
whitespace
tedsuo Dec 15, 2020
0919fa7
clarify that component versions do not need to match
tedsuo Dec 15, 2020
5b93dc7
clarify contrib contains multiple versions
tedsuo Dec 15, 2020
92335ad
complete sentence
tedsuo Dec 15, 2020
31a2cfa
clarify stability may be applied to API ecosystem before SDK ecosystem
tedsuo Dec 15, 2020
93622e5
Clarify that contrib does include core plugins
tedsuo Dec 15, 2020
9938dd5
Give examples of contructors and plugins
tedsuo Dec 15, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
150 changes: 150 additions & 0 deletions text/0143-versioning-and-stability.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
# Versioning and stability for OpenTelemetry clients

OpenTelemetry is a large project with strict compatibility requirements. This proposal defines the stability guarantees offered by the OpenTelemetry clients, along with a versioning and lifecycle proposal which defines how we meet those requirements.

Language implementations are expected to follow this proposal exactly, unless a language or package manager convention interferes significantly. Implementations must take this cross-language proposal, and produce a language-specific proposal which details how these requirements will be met.

Note: In this document, the term OpenTelemetry specifically refers to the OpenTelemetry clients. It does not refer to the specification or the Collector.

## Design goals

**Ensure that end users stay up to date with the latest release.**
We want all users to stay up to date with the latest version of OpenTelemetry. We do not want to create hard breaks in support, of any kind, which leave users stranded on older versions. It must always be possible to upgrade to the latest minor version of OpenTelemetry, without creating compilation or runtime errors.

**Never create a dependency conflict between packages which rely on different versions of OpenTelemetry. Avoid breaking all stable public APIs.**
Backwards compatibility is a strict requirement. Instrumentation APIs cannot create a version conflict, ever. Otherwise, OpenTelemetry cannot be embedded in widely shared libraries, such as web frameworks. Code written against older versions of the API must work with all newer versions of the API. Transitive dependencies of the API cannot create a version conflict. The OpenTelemetry API cannot depend on "foo" if there is any chance that any library or application may require a different, incompatible version of "foo." A library using OpenTelemetry should never become incompatible with other libraries due to a version conflict in one of OpenTelemetry's dependencies. Theoretically, APIs can be deprecated and eventually removed, but this is a process measured in years and we have no plans to do so.

**Allow for multiple levels of package stability within the same release.**
Provide maintainers a clear process for developing new, experimental APIs alongside stable APIs. DIfferent packages within the same release may have different levels of stability. This means that an implementation wishing to release stable tracing today must ensure that experimental metrics are factored out in such a way that breaking changes to metrics API do not destabilize the trace API packages.

## Relevant architecture

![drawing](img/0143_cross_cutting.png)

At the highest architectural level, OpenTelemetry is organized into signals. Each signal provides a specialized form of observability. For example, tracing, metrics, and baggage are three separate signals. Signals share a common subsystem – context propagation – but they function independently from each other.

Each signal provides a mechanism for software to describe itself. A codebase, such as an API handler or a database client, takes a dependency on various signals in order to describe itself. OpenTelemetry instrumentation code is then mixed into the other code within that codebase. This makes OpenTelemetry a **cross-cutting concern** - a piece of software which must be mixed into many other pieces of software in order to provide value. Cross-cutting concerns, by their very nature, violate a core design principle – separation of concerns. As a result, OpenTelemetry requires extra care and attention to avoid creating issues for the codebase which depend upon these cross-cutting APIs.

OpenTelemetry is designed to separate the portion of each signal which must be imported as cross-cutting concerns from the portions of OpenTelemetry which can be managed independently. OpenTelemetry is also designed to be an extensible framework. To accomplish this these goals, each signal consists of four types of packages:

**API -** API packages consist of the cross-cutting public interfaces used for instrumentation. Any portion of OpenTelemetry which 3rd-party libraries and application code depend upon. To manage different levels of stability, every signal has its own, independent API package. These individual APIs may also be bundled up into a shared global API, for convenience.
cijothomas marked this conversation as resolved.
Show resolved Hide resolved
tedsuo marked this conversation as resolved.
Show resolved Hide resolved

carlosalberto marked this conversation as resolved.
Show resolved Hide resolved
**SDK -** The implementation of the API. The SDK is managed by the application owner. Note that the SDKs includes additional public interfaces which are not considered part of the API package, as they are not cross-cutting concerns. These public interfaces include constructors, configuration interfaces, and plugin interfaces. Application owners may interact with the SDK; library developers and instrumentation plugins should never directly reverence SDK packages.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider a "library developer" to include libraries that implement SDK extensions, such as custom exporters. Perhaps this could be simplified to just "instrumentation should never directly reference SDK packages" which I believe is the intent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I clarified this further.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**SDK -** The implementation of the API. The SDK is managed by the application owner. Note that the SDKs includes additional public interfaces which are not considered part of the API package, as they are not cross-cutting concerns. These public interfaces include constructors, configuration interfaces, and plugin interfaces. Application owners may interact with the SDK; library developers and instrumentation plugins should never directly reverence SDK packages.
**SDK -** The implementation of the API. The SDK is managed by the application owner. Note that the SDKs includes additional public interfaces which are not considered part of the API package, as they are not cross-cutting concerns. These public interfaces include constructors, configuration interfaces, and plugin interfaces. Application owners may interact with the SDK; library developers and instrumentation plugins should never directly reference SDK packages.


**Semantic Conventions -** A schema defining the attributes which describe common concepts and operations which the signal observes. Note that unlike the API or SDK, stable conventions for all signals may be placed in the same package, as they are often useful across different signals.

**Contrib –** plugins and instrumentation that make use of the API or SDK interfaces, but are not part of the core packages necessary for running OTel. The term "contrib" specifically refers to the plugins and instrumentation maintained by the OpenTelemetry organization; it does not refer to third party plugins hosted elsewhere.

## Signal lifecycle
carlosalberto marked this conversation as resolved.
Show resolved Hide resolved

OpenTelemetry is structured around signals. Each signal represents a coherent, stand-alone set of functionality. Each signal follows a lifecycle.

![drawing](img/0143_api_lifecycle.png)

### Lifecycle stages

**Experimental –** Breaking changes and performance issues may occur. Components may not be feature-complete. The experiment may be discarded.

**Stable –** Stability guarantees apply, based on component type (API, SDK, Conventions, and Contrib). Long term dependencies may now be taken against these packages.

**Deprecated –** this signal has been replaced but is still retains the same stability guarantees.

**Removed -** a deprecated signal is no longer supported, and is removed.

All signal components may become stable together, or one by one in the following order: API, SDK, Semantic Conventions, Contrib.
tedsuo marked this conversation as resolved.
Show resolved Hide resolved

### Stability

Once a signal component is marked as stable, the following rules apply until the end of that signal’s existence.
tedsuo marked this conversation as resolved.
Show resolved Hide resolved

**API Stability -**
No backward-incompatible changes to the API are allowed unless the major version number is incremented. All existing API calls must continue to compile and function against all future minor versions of the same major version. ABI compatibility is offered in languages which require it.

**SDK Stability -**
Public portions of the SDK (constructors, configuration, end-user interfaces) must remain backwards compatible. Internal interfaces are allowed to break; ABI compatibility is not required.
tedsuo marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the API can make breaking changes when it increases its major version, presumably the SDK must also have a breaking change, which isn't mentioned here.

I think you could simplify this section by talking about "Code Stability" vs "Semantic Conventions Stability". The same rules for backwards-compatibility should apply to all kinds of code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤷‍♀️ People keep asking what guarantees specifically apply to semantic conventions.


**Semantic Conventions Stability -**
Semantic Conventions may not be removed once they are stable. New conventions may be added to replace usage of older conventions, but the older conventions are never removed, they will only be marked as deprecated in favor of the newer ones.

**Contrib Stability -**
Plugins and instrumentation are kept up to date, and are released simultaneously (or shortly after) the latest release of the API. The goal is to ensure users can update to the latest version of OpenTelemetry, and not be held back by the plugins that they depend on.

Public portions of contrib packages (constructors, configuration, interfaces) must remain backwards compatible. ABI compatibility not required.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't ABI compatibility required? Do we expect some level of dependency hell to reduce the maintenance burden of contrib a little?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some discussion, I've changed ABI compatibility into a language-specific concern. It is too difficult to make a broad pronouncement on this nuanced subject.


Telemetry produced by contrib instrumentation must also remain stable and backwards compatible, to avoid breaking alerts and dashboard. This means that existing data may not be mutated or removed without a major version bump. Additional data may be added. This applies to spans, metrics, resources, attributes, events, and any other data types that OpenTelemetry emits.
tedsuo marked this conversation as resolved.
Show resolved Hide resolved

### Deprecation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't any time-period described here between deprecation and removal, which makes deprecation somewhat moot. I can simply replace old with new, and cut a new major version without deprecating. If this is binding in some form (e.g. through a minimum "deprecation period"), we should spell that out. If not, make it clearer that this is an optional, be-a-nice-community sort of thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I don't want to arbitrarily pick a minimum deprecation period, and no one has pointed to any kind of convention or standard to follow. The current deprecation period should be considered "infinite" for the time being.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite following. IIUC, an infinite deprecation period would imply we can't remove signals, even with a new major version, which doesn't align with the rest of the proposal.

Here is what I would suggest as a definition for deprecation:

"A deprecated signal is a signal which will be removed in the next major version, but is identical to a stable signal in all other respects. Marking a signal deprecated does not imply a new major version must be cut, but guarantees that it will be removed if/when a new major version is cut. All signals which are removed in a new major version must be marked deprecated in the old major version before releasing the new major version--no cherrypicking deprecations after-the-fact, and no removing signals which are not deprecated. Marking a signal deprecated is intended to smooth the transition to a new major version by encouraging users to move off of the deprecated signal prior to upgrading to the next major version. A user who has moved off of all deprecated signals will not encounter any compilation errors when upgrading to the new major version."


In theory, signals could be replaced. When this happens, they are marked as deprecated.

Code is only marked as deprecated when the replacement becomes stable. Deprecated code still abides by the same support guarantees as stable code. Deprecated APIs remain stable and backwards compatible.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implies there is always a replacement for deprecated functionality. Are we saying there will never be a signal we want to remove without replacement?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea is to actually remove those components way down the road, just not right away (we did this in OpenTracing, removing APIs after deprecating them, and there were a few complains).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I can't see a reason for us removing stable features just because we decide we don't like them any more. In those cases, we just leave them alone and let them be.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. The only process by which we should remove signals is by introducing a new major version. We should also put a high price on new major versions, which pushes removal of signals way down the road.

I just don't want to imply that major version releases, which can remove signals, will have feature-parity with previous major versions.

Suggested change
Code is only marked as deprecated when the replacement becomes stable. Deprecated code still abides by the same support guarantees as stable code. Deprecated APIs remain stable and backwards compatible.
If the signal is being replaced, code should only be marked as deprecated after the replacement becomes stable. Deprecated code still abides by the same support guarantees as stable code. Deprecated APIs remain stable and backwards compatible.


### Removal

Packages are end-of-life’d by being removed from the release. The release then makes a major version bump.

We currently have no plans for deprecating signals or creating a major version past v1.0.
tedsuo marked this conversation as resolved.
Show resolved Hide resolved

For clarity, it is still possible to create "v2.0" of existing signals without actually moving to v2.0 and breaking support.
tedsuo marked this conversation as resolved.
Show resolved Hide resolved

For example, imagine we develop a new, better tracing API - let's call it AwesomeTrace. We will never mutate the current tracing API into AwesomeTrace. Instead, AwesomeTrace would be added as an entirely new signal which coexists and interoperates with the current tracing signal. This would make adding AwesomeTrace a minor version bump, *not* v2.0. v2.0 would mark the end of support for current tracing, not the addition of AwesomeTrace. And we don't want to ever end that support, if we can help it.

This is not actually a theoretical example. OpenTelemetry already supports two tracing APIs: OpenTelemetry and OpenTracing. We invented a new tracing API, but continue to support the old one.

## Version Numbers

OpenTelemetry follows [semver 2.0](https://semver.org/) conventions, with the following distinction.

OpenTelemetry clients have four components: API, SDK, Semantic Conventions, and Contrib.

For the purposes of versioning, all code within a component is treated as if it were part of a single package, and versioned with the same version number, except for contrib.
tedsuo marked this conversation as resolved.
Show resolved Hide resolved

* All API packages version together, across all signals. Signals do not have separate version numbers.
tedsuo marked this conversation as resolved.
Show resolved Hide resolved
* All SDK packages version together.
tedsuo marked this conversation as resolved.
Show resolved Hide resolved
* Semantic Conventions are a single package with a single version number.
* Each contrib package has it's own version.

Exception: in some languages, package managers may react poorly to experimental packages having a version higher than 0.X. In these cases, a language-specific workaround is required. So far, Go is the only language which has identified this as an issue.
tedsuo marked this conversation as resolved.
Show resolved Hide resolved

Note: different language implementations do not need to have matching version numbers, nor do implementations have to match the version of the specification they implement. For example, it is fine to have opentelemetry-python-api at 1.2.8, opentelemetry-java-api at 1.3.2, and the spec at 1.1.1.

**Major version bump**
Major version bumps only occur when there is a breaking change to a stable interface, or the removal of deprecated signals.

OpenTelemetry values long term support. The expectation is that we will version to v1.0 once the first set of packages are declared stable. OpenTelemetry will then remain at v1.0 for years. There are no plans for a v2.0 of OpenTelemetry at this time. Additional stable packages, such as metrics and logs, will be added as minor version bumps.

**Minor version bump**
Most changes to OpenTelemetry result in a minor version bump.

* New backward-compatible functionality added to any component.
* Breaking changes to internal SDK components.
tedsuo marked this conversation as resolved.
Show resolved Hide resolved
* Breaking changes to experimental signals.
* New experimental packages are added.
* Experimental packages become stable.

**Patch version bump**
Patch versions make no changes which would require recompilation or potentially break application code. The following are examples of patch fixes.

* Bug fixes which don't require minor version bump per rules above.
* Security fixes.
* Documentation.

Currently, OpenTelemetry does have plans to backport bug and security fixes to prior minor versions. Security and bug fixes are only applied to the latest minor version. We are committed to making it feasible for end users to stay up to date with the latest version of OpenTelemetry.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the hypothetical world in which we cut a new major version, and there was a security vulnerability, I think we would want to patch the latest minor version in both major versions. It would be hard to force customers to upgrade to a new major version to get a security fix, as major version upgrades will be disruptive by definition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I agree that back-porting across major versions seems both reasonable and likely. But how many major versions? I would rather not pick something arbitrary, and leave it open for the future when we come to it. We can always add this clarification later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my doc I had proposed all major versions that have had a minor release in the past year, but starts to get into the LTS discussion, which may be too much for this proposal. It might be easier to start with something optional, like "it is up to maintainers to decide whether to support additional minor versions, or to support minor versions from previous major versions"

tedsuo marked this conversation as resolved.
Show resolved Hide resolved

## Long Term Support

Major versions of the API will be supported for three years after the release of the next major version.

A version of the SDK which supports the last major version of the API will continue to be maintained during this period. Bug and security fixes will be backported. Additional feature development is not guaranteed.

Contrib packages available when the API is versioned will continue to be maintained for the duration of this period. Bug and security fixes will be backported. Additional feature development is not guaranteed.

## Open questions

### “OpenTelemetry GA”

In theory, we have assumed that tracing and metrics would be released together as a v1.0, which we refer to as “OpenTelemetry GA.”

However, in practice, it appears that tracing will be ready to GA before metrics. Tracing is ready today in .NET, and metrics are still months away from being finished.

While we can continue to use the term OpenTelemetry GA to mean the release of both tracing and metrics, we should decouple this from our versioning and support terminology. That allows us to announce stable tracing this month.
Binary file added text/img/0143_api_lifecycle.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added text/img/0143_cross_cutting.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.