From ecc6d6e6c793de8b30836e1ca11e106dbf28c005 Mon Sep 17 00:00:00 2001 From: JBD Date: Mon, 8 Mar 2021 19:00:09 -0800 Subject: [PATCH 1/5] Initial commit of the Prometheus/OpenTelemetry compatibility spec This is an early draft that outlines the goals and expectations from OpenTelemetry in order to provide Prometheus support. The work is in the early stages, behavioral expectations from the collector and the libraries might change or be expanded in the future. In the long term, we will graduate this doc to the OpenTelemetry compatibility specs under the opentelemetry-specification repo. --- specification.md | 89 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 89 insertions(+) create mode 100644 specification.md diff --git a/specification.md b/specification.md new file mode 100644 index 0000000..9a64e71 --- /dev/null +++ b/specification.md @@ -0,0 +1,89 @@ +# OpenTelemetry/Prometheus Compatibility Specification + +Status: [Experimental](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/document-status.md) + +## Abstract + +OpenTelemetry is aiming to provide compatibility with +Prometheus and OpenMetrics. This document explains the +extent of the support. OpenTelemetry collector and libraries +will align with the compatibility requirements defined in +this spec. + +## Goals + +* OpenTelemetry collector can be used as a drop-in replacement + for Prometheus server to scrape and export metrics data. +* OpenTelemetry collector should export OLTP-compatible Prometheus + time series to OpenTelemetry metrics exporters. +* OpenTelemetry libraries should implement exporters to publish + Prometheus metrics. + +## Differences and Limitations + +OpenTelemetry and Prometheus/OpenMetrics have different design +goals and this reflects in the data model and implementation +details. This section summarizes a few key differences. + +* **Pull vs push**: Prometheus is mainly designed for pull + whereas OpenTelemetry primarily is designed for push. This + difference causes how the state is maintained throughout the + collection including how it’s handled in the OpenTelemetry + collector. +* **Cumulative vs delta**: OpenTelemetry supports delta temporality + whereas Prometheus always expects absolute/cumulative values. This + breaks some components that produce and communicate deltas where + cumulatives cannot be rebuilt before being exported to Prometheus. +* **Histogram boundaries**: Prometheus histogram boundaries are by + lower equal (le) while OpenTelemetry histogram boundaries are + greater equal (ge). (This difference will be resolved via #18) +* **Semantic conventions**: OpenTelemetry predefines semantic + conventions to collect additional metadata with telemetry data. + Prometheus users don’t follow the same conventions and Prometheus + client library provided data may lack the semantic conventions + available in OpenTelemetry libraries. + +## Compatibility Requirements + +Given the number of fundamental design goals between OpenTelemetry +and Prometheus, our aim is to close the gaps where possible and +make the right compromises to meet the goals defined in this document. +OpenTelemery and Prometheus won’t be fully compatible but we will +enable important use cases to enable OpenTelemetry for Prometheus +users. The following sections summarizes the expectations from +the collector and the libraries. + +### Collector + +* Collector will implement a Prometheus remote write exporter. + Publishing a pull-based metrics handler with all collected + metrics is not a scalable approach. +* Collector will support scraping and ingesting cumulative metrics. + Prometheus doesn’t support deltas and there are cases where + rebuilding the cumulatives from deltas is not possible/easy. +* Collector will support all discovery and scraping configuration + options in the Prometheus server. Collector will ignore the + alerting rules. +* Collector will support exporting cumulative series to + OLTP-compatible exporters. +* Collector will produce an “up” metric as a gauge that is compatible + with the Prometheus server “up” metric. If scrape succeeds, + it will produce 1 and 0 if it fails. +* Collector will produce "instance" and "job" labels similar + to the Prometheus server. +* If a target disappears from the scrape, the collector will + write explicit staleness markers for the respective timeseries. +* Collector won’t assume any OpenTelemetry semantic conventions + might be in place in the scraped data. Collector may decorate + the samples with some semantic convention attributes available + in the collector. +* Collector will support WAL if it has access to persistent storage. +* Collector will provide remote write fine tuning options similar + to Prometheus server. + +### Libraries + +* Libraries should implement a Prometheus metrics handler that + will listen to a user-specified host:port. +* Libraries should provide Prometheus metrics in text format and + may have protobuf support. From 9018599a52e07cfd4f056f64aadce6d37d177f5c Mon Sep 17 00:00:00 2001 From: JBD Date: Thu, 11 Mar 2021 14:51:32 -0800 Subject: [PATCH 2/5] Address feedback --- specification.md | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/specification.md b/specification.md index 9a64e71..0f26cd3 100644 --- a/specification.md +++ b/specification.md @@ -14,10 +14,12 @@ this spec. * OpenTelemetry collector can be used as a drop-in replacement for Prometheus server to scrape and export metrics data. -* OpenTelemetry collector should export OLTP-compatible Prometheus +* OpenTelemetry collector should export OTLP-compatible Prometheus time series to OpenTelemetry metrics exporters. * OpenTelemetry libraries should implement exporters to publish Prometheus metrics. +* To the OpenTelemetry collector, an OpenTelemetry target and + a Prometheus instrumented target is indistinguishable. ## Differences and Limitations @@ -65,9 +67,9 @@ the collector and the libraries. options in the Prometheus server. Collector will ignore the alerting rules. * Collector will support exporting cumulative series to - OLTP-compatible exporters. -* Collector will produce an “up” metric as a gauge that is compatible - with the Prometheus server “up” metric. If scrape succeeds, + OTLP-compatible exporters. +* Collector will produce an "up" metric as a gauge that is compatible + with the Prometheus server "up" metric. If scrape succeeds, it will produce 1 and 0 if it fails. * Collector will produce "instance" and "job" labels similar to the Prometheus server. @@ -77,7 +79,8 @@ the collector and the libraries. might be in place in the scraped data. Collector may decorate the samples with some semantic convention attributes available in the collector. -* Collector will support WAL if it has access to persistent storage. +* Collector will support Promeheus WAL if it has access + to persistent storage. * Collector will provide remote write fine tuning options similar to Prometheus server. From 89bb879fbc9b03fe5e1f9a083efb765dc2e1d32f Mon Sep 17 00:00:00 2001 From: JBD Date: Fri, 12 Mar 2021 09:28:10 -0800 Subject: [PATCH 3/5] Clarify that up metric will be translated back to Promeheus up metric --- specification.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/specification.md b/specification.md index 0f26cd3..dec3d74 100644 --- a/specification.md +++ b/specification.md @@ -68,9 +68,9 @@ the collector and the libraries. alerting rules. * Collector will support exporting cumulative series to OTLP-compatible exporters. -* Collector will produce an "up" metric as a gauge that is compatible - with the Prometheus server "up" metric. If scrape succeeds, - it will produce 1 and 0 if it fails. +* At each scrape, collector will produce an "up" metric as a gauge + internally in the reciever and exporter will translate it to a + Prometheus up metric in the export time. * Collector will produce "instance" and "job" labels similar to the Prometheus server. * If a target disappears from the scrape, the collector will From 06e928b73ba9088e3c14f925384a7b39320cb0d4 Mon Sep 17 00:00:00 2001 From: JBD Date: Tue, 16 Mar 2021 09:21:18 -0700 Subject: [PATCH 4/5] Remove vague language and strong statements Add more details and future work. --- specification.md | 33 ++++++++++++++++----------------- 1 file changed, 16 insertions(+), 17 deletions(-) diff --git a/specification.md b/specification.md index dec3d74..c5685b5 100644 --- a/specification.md +++ b/specification.md @@ -34,35 +34,32 @@ details. This section summarizes a few key differences. collector. * **Cumulative vs delta**: OpenTelemetry supports delta temporality whereas Prometheus always expects absolute/cumulative values. This - breaks some components that produce and communicate deltas where - cumulatives cannot be rebuilt before being exported to Prometheus. + may result in deltas (collected by OTel client libraries) not being + able to exported to Prometheus, but Prometheus instrumented metrics + will be fully supported because they are cumulative. * **Histogram boundaries**: Prometheus histogram boundaries are by lower equal (le) while OpenTelemetry histogram boundaries are - greater equal (ge). (This difference will be resolved via #18) + greater equal (ge). This soon will be fixed by + [opentelemetry-proto#262](https://github.com/open-telemetry/opentelemetry-proto/pull/262). * **Semantic conventions**: OpenTelemetry predefines semantic conventions to collect additional metadata with telemetry data. Prometheus users don’t follow the same conventions and Prometheus client library provided data may lack the semantic conventions available in OpenTelemetry libraries. -## Compatibility Requirements - -Given the number of fundamental design goals between OpenTelemetry -and Prometheus, our aim is to close the gaps where possible and -make the right compromises to meet the goals defined in this document. -OpenTelemery and Prometheus won’t be fully compatible but we will -enable important use cases to enable OpenTelemetry for Prometheus -users. The following sections summarizes the expectations from -the collector and the libraries. +## Implementation Requirements ### Collector * Collector will implement a Prometheus remote write exporter. - Publishing a pull-based metrics handler with all collected - metrics is not a scalable approach. + Collector is a common metrics sink in collection pipelines where + metric data points are recieved and quickly "forwarded" to exporters. + Implementing a pull-based metrics handler will require additional + design work in this model to be efficient, we may follow-up + with improvements to enable a metrics handler. * Collector will support scraping and ingesting cumulative metrics. - Prometheus doesn’t support deltas and there are cases where - rebuilding the cumulatives from deltas is not possible/easy. + Collector will not try to rebuild the cumulatives from deltas + at this moment but we may improve this case in the future. * Collector will support all discovery and scraping configuration options in the Prometheus server. Collector will ignore the alerting rules. @@ -70,7 +67,9 @@ the collector and the libraries. OTLP-compatible exporters. * At each scrape, collector will produce an "up" metric as a gauge internally in the reciever and exporter will translate it to a - Prometheus up metric in the export time. + Prometheus up metric in the remote write exporter. + [opentelemetry-specification#1078](https://github.com/open-telemetry/opentelemetry-specification/issues/1078) + is going to address this issue at the data model in the future. * Collector will produce "instance" and "job" labels similar to the Prometheus server. * If a target disappears from the scrape, the collector will From 0a9283dc089b81462c05f2d7879958f0274ee601 Mon Sep 17 00:00:00 2001 From: JBD Date: Tue, 16 Mar 2021 09:57:04 -0700 Subject: [PATCH 5/5] Fix wording --- specification.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/specification.md b/specification.md index c5685b5..dbc64af 100644 --- a/specification.md +++ b/specification.md @@ -29,9 +29,9 @@ details. This section summarizes a few key differences. * **Pull vs push**: Prometheus is mainly designed for pull whereas OpenTelemetry primarily is designed for push. This - difference causes how the state is maintained throughout the - collection including how it’s handled in the OpenTelemetry - collector. + difference changes how the state is maintained throughout the + collection pipeline, including how it’s handled in the + OpenTelemetry collector. * **Cumulative vs delta**: OpenTelemetry supports delta temporality whereas Prometheus always expects absolute/cumulative values. This may result in deltas (collected by OTel client libraries) not being