-
Notifications
You must be signed in to change notification settings - Fork 164
Add metrics view API #89
Changes from 3 commits
8b6d17a
8919885
6dba262
c78c23b
8a25e19
e98068f
fb79f78
db5c442
d71ee90
b01f4dc
46948ac
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,207 @@ | ||||||
# Metrics View API | ||||||
|
||||||
This OTEP addresses the [Future Work: Configurable Aggregations / View API](https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/api-metrics.md#future-work-configurable-aggregations--view-api) section of [api-metrics.md](https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/api-metrics.md). | ||||||
It proposes a specification for a _view API_ that allows application owners and administrators to configure aggregations for individual metric instruments. | ||||||
|
||||||
## TL;DR | ||||||
|
||||||
The potentially contentious changes this OTEP proposes include: | ||||||
|
||||||
- Require views to be registered: don't record measurements from API for metric instruments that don't appear in any view. | ||||||
- Drop support for LastValue aggregations, don't allow multiple Observer calls per export interval. | ||||||
- Don't allow views to specify how "dropped" keys are aggregated, since we can safely ignore aggregation order for measurement instruments other than observers. | ||||||
|
||||||
## Motivation | ||||||
|
||||||
As of [opentelemetry-specification#430](https://github.com/open-telemetry/opentelemetry-specification/pull/430), the spec describes three different metric instrument types, and a default aggregation for each. | ||||||
These default aggregations are consistent with the semantics of each metric instrument, but not with the various exposition formats required by APM backends. | ||||||
|
||||||
For example, consider the [Prometheus histogram exposition format](https://prometheus.io/docs/concepts/metric_types/#histogram). | ||||||
The default aggregation for the Measure metric instrument is _MinMaxSumCount_. | ||||||
This aggregation computes summary statistics about (i.e. the _min_, _max_, _sum_, and _count_ of) the set of recorded values, but doesn't capture enough information to reconstruct histogram bucket counts. | ||||||
|
||||||
If instead the Measure instrument used an [_Exact_ aggregation](https://github.com/open-telemetry/opentelemetry-specification/pull/347/files#diff-5b01bbf3430dde7fc5789b5919d03001R254-R259) -- which stores the sequence of measurements without aggregating them -- the Prometheus exporter could compute the histogram for each recording interval at export time, but at the cost of significantly increased memory usage. | ||||||
|
||||||
A better solution would be to use a custom _Histogram_ aggregation, which could track bucket counts and compute summary statistics without preserving the original measurements. | ||||||
|
||||||
As the spec is written now, all metrics are aggregated following the default aggregation for the metric instrument by which they were recorded. | ||||||
By design, there is no option to change these defaults: | ||||||
|
||||||
> [Users do not have a facility in the API to select the aggregation they want for particular instruments.](https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/api-metrics.md#aggregations) | ||||||
|
||||||
This limitation prevents users from: | ||||||
|
||||||
- Converting aggregated data to vendor-specific exposition formats | ||||||
- Configuring aggregation options, e.g. histogram bucket boundaries | ||||||
- Specifying multiple aggregations for a single metric | ||||||
|
||||||
In addition, it's often desirable to aggregate metrics with respect to a specific set of label keys to reduce memory use and export data size. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would it be correct to say that Views is the only way to pick/drop dimensions? Or Views is just one way. And batcher can still drop dimensions further? |
||||||
|
||||||
A _view API_ would allow users to use custom aggregation types, associate individual metrics with one or more aggregations, configure aggregation options, and specify the set of label keys to use to aggregate each metric. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you clarify if the following scenario is achievable with Views.? A counter is tracked with 2 dimensions, url,statuscode. Lets says there are 10 possible values for url and 5 possible values for statusCode. This would potentially result in 10*5 = 50 time series stored by default. User don't want to pay for that much. He cares about url, statuscode but not their combination. So user defines 2 views, view1(url), and view(statuscode). Now this would result in 10 + 5 = 15 time series only. So can we say Views can be used to achieve timeseries reduction in the above way? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd also expect what you stated to be achievable. programatically something like:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This can even be achieved with only one View.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
With one view, the total timeseries could be 10*5. With 2 views, its 10+5. This makes a lot of difference in terms of cost. (memory cost, and financial cost of exporting more time series to APM backends) The goal is to reduce the # of timeseries, hence splitting into multiple views. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ahh I misread your question. Yes you are correct. |
||||||
|
||||||
## Explanation | ||||||
|
||||||
Metric instruments are API objects that record measurements from application code. | ||||||
These measurements are aggregated by _Aggregators_, which are SDK objects. | ||||||
Aggregated measurement data are passed to _Exporters_, which convert these data into APM-specific exposition formats. | ||||||
|
||||||
Each view describes a relationship between a metric instrument and an aggregation, and specifies the set of label keys to preserve in the aggregated data. | ||||||
|
||||||
Views are tracked in a registry, which exporters may use to get the current aggregate values for all registered aggregations at export time. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is Views a global thing, and hence must be registered with MeterProvider, so that all meters created the MeterProvider will get access to View.? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Additional thought around this: would it be valuable to enable specific aggregations to be tied to specific exporters? I feel like this is another reason why aggregations should be tied to a MeterProvider. Then you can choose different meterprovider sets for different metrics, as necessary. Something like:
|
||||||
Each registered view represents a metric that is meant to be exported. | ||||||
As a corollary, the SDK can choose to drop measurements from any metric instrument that does not appear in a registered view. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should behavior for the opentelemetry-sdk be stated here? Seems like we should make a choice one way or the other. I would opt to drop measurements, as it could be significant memory overhead to just keep measurements that will never be exported. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 to not having the SDK be able to "choose" but that being the required way it works. This does not break with the desire to not require users to define views to get metrics if we go with my suggestion of default view generation for each instrument -- since there would by default be a view, so measurements would not be dropped without the end user/operator explicitly disable default views or setting their own. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd like to get more clarity on "Should exporters be even aware of ViewRegistry?" Or only Meter implementation need to be aware of ViewRegistry. Exporters will continue to simply have a method like Export(collection). And Meter implementation takes of dropping/retaining dimensions, choosing right aggregation etc from ViewRegistry. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe from last week's metrics SIG, it was decided that any measurement that was not registered with a View would be aggregated with a default aggregation. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Curious how that works in practice. It would require something to iterate through all metrics during every export and see if they have a view already registered. |
||||||
|
||||||
## From measurement to APM backend | ||||||
|
||||||
The goal of separating metric instrument types from aggregations is that it should be possible to switch to a different APM backend without changing application code. | ||||||
|
||||||
The default SDK should include conventional aggregators, and exporters should convert aggregated data into the appropriate exposition format based on the aggregator type. | ||||||
Some exporters will require custom aggregators, either because the built-in aggregators don't capture enough information to convert aggregated data into a particular exposition format, or to improve on the efficiency of the built-in aggregators. | ||||||
|
||||||
APM vendors should typically include custom exporters and any custom aggregators they rely on in the same distribution. | ||||||
|
||||||
One of the design goals of OpenTelemetry metrics is that it should work seamlessly with existing metrics systems like [StatsD](https://github.com/statsd/statsd/wiki) and [Prometheus/OpenMetrics,](https://github.com/prometheus/prometheus) and provide API instruments that are familiar to these systems' users. | ||||||
|
||||||
To check that the view API proposed here satisfies this goal, it may be helpful to compare OpenTelemetry views -- which are comprised of a metric instrument and aggregation -- to other systems' metric types: | ||||||
|
||||||
| System | Metric Type | OT Metric Instrument | OT Aggregation | | ||||||
| ---------- | -------------------------------------------------------------------------------------------------- | -------------------- | ---------------- | | ||||||
| OpenCensus | Count | Counter | Sum | | ||||||
| OpenCensus | Mean | Counter | Mean\* | | ||||||
| OpenCensus | Sum | Counter | Sum | | ||||||
| OpenCensus | Distribution | Measure | Histogram | | ||||||
| OpenCensus | LastValue | Observer | LastValue | | ||||||
| Datadog | [COUNT](https://docs.datadoghq.com/developers/metrics/types/?tab=count#metric-types) | Counter | Sum | | ||||||
| Datadog | [RATE](https://docs.datadoghq.com/developers/metrics/types/?tab=rate#metric-types) | Counter | Rate\* | | ||||||
| Datadog | [GAUGE](https://docs.datadoghq.com/developers/metrics/types/?tab=gauge#metric-types) | Observer | LastValue | | ||||||
| Datadog | [HISTOGRAM](https://docs.datadoghq.com/developers/metrics/types/?tab=histogram#metric-types) | Measure | Histogram | | ||||||
| Datadog | [DISTRIBUTION](https://docs.datadoghq.com/developers/metrics/types/?tab=distribution#metric-types) | Measure | Sketch\* or Exact | | ||||||
| Prometheus | [Counter](https://prometheus.io/docs/concepts/metric_types/#counter) | Counter | Sum | | ||||||
| Prometheus | [Gauge](https://prometheus.io/docs/concepts/metric_types/#gauge) | Observer | LastValue | | ||||||
| Prometheus | [Histogram](https://prometheus.io/docs/concepts/metric_types/#histogram) | Measure | Histogram | | ||||||
| Prometheus | [Summary](https://prometheus.io/docs/concepts/metric_types/#summary) | Measure | Sketch\* or Exact | | ||||||
|
||||||
A well-designed view API should not require extensive configuration for typical use cases. | ||||||
It should also be expressive enough to support custom aggregations and exposition formats without requiring APM vendors to write custom SDKs. | ||||||
In short, it should make easy things easy and hard things possible. | ||||||
|
||||||
## Internal details | ||||||
|
||||||
A view is defined by: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we allow users to create/modify Views in the middle of the program? Or Views must be created at initialization time? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I do not believe we should allow users to do that. There are some issues with defining Views after a collection pass has been done, it gets too complicated to deal with those so we should enforce Views to be created and immutable before recording. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I need to understand the details a bit more on the issues, but there's tremendous value to being able to modify views on runtime. In a conversation with @c24t a while back, he mentioned that there's a way to configure views across application dynamically for debugging purposes. For example, imagine that one of the keys you store is the service name sending you the request. When your application starts seeing a significant increase in requests, you can turn on a view that splits the metrics by service name, and see which service is causing issues. Once resolved, you can turn them off, ensuring that you don't have high cardinality data points that consume significant metric count when there is no outage. |
||||||
|
||||||
- A metric instrument | ||||||
- An aggregation | ||||||
- An optional set of label keys to preserve | ||||||
- An optional unique name | ||||||
- An optional description | ||||||
|
||||||
The aggregation may be configured with options specific to its type. | ||||||
For example, a _Histogram_ aggregation may be configured with bucket boundaries, e.g. `{0, 1, 10, 100, 200, 1000, inf}`. | ||||||
A _Sketch_ aggregation that estimates order statistics (i.e. quantiles), may be configured with a set of predetermined quantiles, e.g. `{.5, .95, .99, 1.00}`. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could this be called "quantile" aggregation? A sketch doesn't immediately make me think of quantiles. |
||||||
|
||||||
Each view has a unique name, which may be automatically determined from the metric instrument and aggregation. | ||||||
Implementations should refuse to register two views with the same name. | ||||||
|
||||||
Note that a view does not describe the type (e.g. `int`, `float`) or unit of measurement (e.g. "bytes", "milliseconds") of the metric to be exported. | ||||||
The unit is determined by the metric instrument, and the aggregation and exporter may preserve or change the unit. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It seems like there's only two operations which are appropriate for a unit modification on a metric instrument:
I feel like it may be valuable to call out that there won't be a modification such as "seconds" to "milliseconds". But maybe that's just obvious. |
||||||
For example, a hypothetical MinMaxSumMeanCount aggregation of an int-valued millisecond latency measure may be exported as five separate metrics to the APM backend: | ||||||
|
||||||
- An int-valued _min_ metric with "ms" units | ||||||
- An int-valued _max_ metric with "ms" units | ||||||
- An int-valued _sum_ metric with "ms" units | ||||||
- A **float**-valued _mean_ metric with "ms" units | ||||||
- An **int** valued, unitless (i.e. unit "1") _count_ metric | ||||||
|
||||||
This OTEP doesn't propose a particular API for Aggregators, just that the API is sufficient for exporters to get all this information, including: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should the otep recommend an API? reading the examples, it seems like something that may work would be (name, value, unit). I believe it seems it satisfies the examples below. |
||||||
|
||||||
- That the _min_, _max_, and _sum_ metrics preserve the type and unit of the underlying Measure | ||||||
- That the _mean_ metric is float-valued, but preserves the underlying measure's unit | ||||||
- That the _count_ metric is int-valued with unit "1", regardless of the underlying measure's unit | ||||||
|
||||||
### Aggregating over time | ||||||
|
||||||
An aggregation describes how multiple measurements captured via the same metric instrument in a single collection interval are combined. | ||||||
_Aggregators_ are SDK objects, and the default SDK includes an Aggregator interface (see [opentelemetry-specification#347](https://github.com/open-telemetry/opentelemetry-specification/pull/347) for details). | ||||||
|
||||||
Aggregations are assumed to be [mergeable](https://www.cs.utah.edu/~jeffp/papers/mergeSumm-MASSIVE11.pdf): aggregating a sequence of measurements should produce the same result as partitioning the sequence into subsequences, aggregating each subsequence, and combining the results. | ||||||
|
||||||
Said differently: given an aggregation function `agg` and sequence of measurements `S`, there should exist a function `merge` such that: | ||||||
|
||||||
<!-- <img src="https://render.githubusercontent.com/render/math?math=\mathrm{agg}(S) = \mathrm{merge}(\mathrm{agg}(S_1), \mathrm{agg}(S_2), \ldots, \mathrm{agg}(S_N))"> --> | ||||||
|
||||||
``` | ||||||
agg(S) = merge(agg(S_1), agg(S_2), ..., agg(S_N)) | ||||||
``` | ||||||
|
||||||
<!-- <img src="https://render.githubusercontent.com/render/math?math=\{S_1, \ldots, S_N\}"> --> | ||||||
<!-- <img src="https://render.githubusercontent.com/render/math?math=S"> --> | ||||||
|
||||||
For every partition `{S_1, ..., S_N}` of `S`. | ||||||
|
||||||
For example, the _min_ aggregation is trivially mergeable: | ||||||
|
||||||
<!-- <img src="https://render.githubusercontent.com/render/math?math=\mathrm{min}([1, 2, 3, 4, 5, 6]) = \mathrm{min}([\mathrm{min}([1]), \mathrm{min}([2, 3]), \mathrm{min}([4, 5, 6])])"> --> | ||||||
|
||||||
``` | ||||||
min([1, 2, 3, 4, 5, 6]) = min([min([1]), min([2, 3]), min([4, 5, 6])]) | ||||||
``` | ||||||
|
||||||
but quantile aggregations, e.g. _p95_ and _p99_ are not. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this mean that quantiles are not supported as an aggregation? There's a strong need for quantiles, so I assume there's some way to ensure that it is measured and exported, but not clear how. |
||||||
Applications that export quantile metrics should use a mergeable aggregations such as [DDSketch](https://arxiv.org/abs/1908.10693), which estimates quantile values with bounded errors, or export raw measurements without aggregation and compute exact quantiles on the backend. | ||||||
|
||||||
We require aggregations to be mergeable so that they produce the same results regardless of the collection interval, or the number of collection events per export interval. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess again looking at quantiles as the edge case: what's the need for ensuring that the produced result are the same, regardless of collection interval? Is there some caveat where the collection interval cannot be guaranteed? I guess in current API design it's in the pushController side of things. |
||||||
|
||||||
### Aggregating across label keys | ||||||
|
||||||
Every measurement is associated with a _LabelSet_, a set of key-value pairs that describes the environment in which the measurement was captured. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
Label keys and values may be extracted from the [correlation context](https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/api-correlationcontext.md) at the time of measurement. | ||||||
They may also be set by the user [at the time of measurement](https://github.com/open-telemetry/opentelemetry-specification/blob/f4aeb318a5b77c9c39132a8cbc5d995e222d124f/specification/api-metrics-user.md#direct-instrument-calling-convention) or at the time at which [the metric instrument is bound](https://github.com/open-telemetry/opentelemetry-specification/blob/f4aeb318a5b77c9c39132a8cbc5d995e222d124f/specification/api-metrics-user.md#bound-instrument-calling-convention). | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How specifically are values from the correlation context piped in to the labelset? I figured that the user would still be responsible for that, but as this is calling out label extraction as a separate line, it sounds like that's not the case. |
||||||
|
||||||
For each registered view, in each collection interval, measurements with the same labelset are aggregated together. | ||||||
|
||||||
The view API should allow the user to specify the set of label keys to track for a given view. | ||||||
Other label keys should be dropped before aggregation. | ||||||
By default, if the user doesn't specify a set of label keys to track, the aggregator should record all label keys. | ||||||
|
||||||
Users have three options to configure a view's label keys: | ||||||
|
||||||
1. Record all label keys. | ||||||
This is the default option, and the behavior of the "defaultkeys" integrator in opentelemetry-specification#347. | ||||||
2. Specify a set of label keys to track at view creation time, and drop other keys from the labelsets of recorded measurements before aggregating. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Might want to remove the "labelset" term in this doc. |
||||||
This is the behavior of the "ungrouped" integrator in opentelemetry-specification#347. | ||||||
3. Drop all label keys, and aggregate all measurements from the metric instrument together regardless of their labelsets. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To clarify, this means that the exported metrics will have no labels associated with them correct? |
||||||
This is equivalent to using an empty list with option 2. | ||||||
|
||||||
Because we don't require the user to specify the set of label keys up front, and because we don't prevent users from recording measurements with missing labels in the API, some label values may be undefined. | ||||||
Aggregators should preserve undefined label values, and exporters may convert them as required by the backend. | ||||||
|
||||||
For example, consider a Sum-aggregated Counter instrument that captures four consecutive measurements: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you also add a note about boundinstruments and views? Boundinstruments are supposed to be the fastest way to record a value, as it avoids a lookup for the time series. With Views, bound instruments should still be the fastest, and should not be doing lookups. This maybe implementation detail, but calling it out in spec as well. |
||||||
|
||||||
1. `{labelset: {'k1': 'v11'}, value: 1}` | ||||||
2. `{labelset: {'k1': 'v12'}, value: 10}` | ||||||
3. `{labelset: {'k1': 'v11', 'k2': 'v21'}, value: 100}` | ||||||
4. `{labelset: {'k1': 'v12', 'k2': 'v22'}, value: 1000}` | ||||||
|
||||||
And consider three different views, one that tracks label key _k1_, one that tracks label key _k2_, and one that tracks both. | ||||||
|
||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. From this example, can I assume that the labelset reduction/drop should occur at the Aggregator itself. OR, does the aggregator still keep all dimensions (storing k1_uniquecount * k2_uniquecount timeseries), and exporter/batcher drops the dimensions? |
||||||
After each measurement, the aggregator associated with each view has a different set of aggregated values: | ||||||
|
||||||
| time | k1 | k2 | value | agg([k1]) | agg([k2]) | agg([k1, k2]) (default) | | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is example is really helpful! |
||||||
| ----------- | ----------------------------------- | --- | ----- | ------------------------------------------- | ------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------- | | ||||||
| 1 | v11 | - | 1 | `({k1: v11}, 1)` | `({k2: undefined}, 1)` | `({k1: v11, k2: undefined}, 1)` | | ||||||
| 2 | v12 | - | 10 | `({k1: v11}: 1)` <br> `({k1: v12}: 10)` | `({k2: undefined}, 11)` | `({k1: v11, k2: undefined}, 1)` <br> `({k1: v12, k2: undefined}, 10)` | | ||||||
| 3 | v11 | v21 | 100 | `({k1: v11}: 101)` <br> `({k1: v12}: 10)` | `({k2: undefined}, 11)` <br> `({k2: v21}, 100)` | `({k1: v11, k2: undefined}, 1)` <br> `({k1: v12, k2: undefined}, 10)` <br> `({k1: v11, k2: v21}, 100)` | | ||||||
| 4 | v12 | v22 | 1000 | `({k1: v11}: 101)` <br> `({k1: v12}: 1010)` | `({k2: undefined}, 11)` <br> `({k2: v21}, 100)` <br> `({k2: v22}, 1000)` | `({k1: v11, k2: undefined}, 1)` <br> `({k1: v12, k2: undefined}, 10)` <br> `({k1: v11, k2: v21}, 100)` <br> `({k1: v12, k2: v22}, 1000)` | | ||||||
|
||||||
Aggregated values are mergeable with respect to labelsets as well as time, as long as the intermediate aggregations preserve the label keys to be included in the final result. | ||||||
Note that it's possible to reconstruct the aggregated values at each step for _agg([k1])_ and _agg([k2])_ from _agg([k1, k2])_, but not vice versa. | ||||||
|
||||||
## Prior art and alternatives | ||||||
|
||||||
One alternative is not to include a view API, and require users to configure metric instruments and aggregations individually. | ||||||
This approach is partially described in the current spec. | ||||||
See the motivation section for an argument against this approach. | ||||||
|
||||||
This API borrows heavily from [OpenCensus views](https://opencensus.io/stats/view/). | ||||||
|
||||||
See also [DataDog metric types](https://docs.datadoghq.com/developers/metrics/types/?tab=distribution#metric-types) and [Prometheus metric types](https://prometheus.io/docs/concepts/metric_types). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If views are registered by the operator/application developer, does this mean that that person will get no metric data if they don't add additional configuration? How can they know a priori what metrics they they need to register views for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way I've done it is each library exports a function with predefined views. In the simplest case the application developer either enables them or not, while still having the ability to instead define their own, or take a subset of the default views a library exports.
So the operator/application developer has to allow metrics from each library, instead of them being automatic and having to have a way to disable what you don't want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm now thinking of the auto-instrumentation projects, where there may be dozens of libraries being instrumented that the operator may or may not be familiar with. This is going to be a very surprising thing for "normal" auto-instrumentation agent users to have to do, as they are used to getting things OOTB without extra configuration. Would the auto-instrumentation agent then be expected to just register all the defaults provided with the instrumented libraries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does!
That's what I would expect, and IIRC it's what we did in OC. For example, the predefined default views for gRPC are here: https://github.com/census-instrumentation/opencensus-java/blob/8b1fd5bbf98b27d0ad27394891e0c64c1171cb2b/contrib/grpc_metrics/src/main/java/io/opencensus/contrib/grpc/metrics/RpcViewConstants.java.
Auto-instrumentation could enable all these views when it loads the gRPC integration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@trask @prydin @tylerbenson thoughts on this? It definitely seems on the surface like it would make the auto-instrumentation job significantly more difficult. Especially since auto-instrumentation still has to support multiple backends with potentially different aggregations/metric types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@c24t This only seems viable to me if there's a way for an exporter author to say (for example) "for everything that's a Measure, create a view with a MinMaxSumCount aggregation.". Otherwise, I feel like this is a recipe for users thinking that nothing is working, causing a support nightmare.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like a good solution to me, and since measures already have unique names I don't see any reason in principle that we couldn't do this.
@bogdandrutu might be able to shed some light on the design of OC views.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a note about this in 46948ac.