-
Notifications
You must be signed in to change notification settings - Fork 898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metric SDK specification OUTLINE #347
Merged
bogdandrutu
merged 36 commits into
open-telemetry:master
from
jmacd:jmacd/draft_metric_sdk_spec
Aug 20, 2020
Merged
Changes from 1 commit
Commits
Show all changes
36 commits
Select commit
Hold shift + click to select a range
c10ba0e
WIP: Metric SDK specification
bfda24a
Updates following Tigran's feedback
b04e927
More rewording
401292b
Strengthen requirements for aggregators
6990089
Fix typos noted by MrAlias
077141b
Address some of freeformzSFDC's feedback
1542a43
Upstream
3cb05a4
Capitalization
7e785de
Respond to feedback
35dff70
Respond to feedback
9d40730
Handle->Bound instrument
3ec4e00
New img
6f3fbdb
New img ref
8692b24
Update image
2b75442
Rename to Differentiator/Integrator
7be3138
Remove reference to defaultkeys batcher
f9866aa
Batcher->Integrator
931bd8c
Differentiator->Accumulator
535fc0e
Upstrema
4405845
Upstream
f358adf
Move the image
1fc3ec1
Update image
64bbb0c
Simplify the diagram
7063963
Upstream
8929123
Remove much dead code
b36917a
Remove more dead code
f698d3c
Lint
16a317b
Upstream
53d4418
Ignore
4857098
Rename metrics SDK
023838d
Bold
80d0b9a
CheckpointSet -> ExportRecordSet
80c16b2
Update diagram
d5a7162
Update diagram (png)
5c1700a
Editing
6aa3683
Undo
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,195 @@ | ||
# Metric SDK | ||
|
||
_This document is derived from the Golang Metrics SDK prototype. See | ||
the currently open PRs:_ | ||
1. [Pipeline and stdout exporter](https://github.com/open-telemetry/opentelemetry-go/pull/265) | ||
1. [Dogstatsd exporter](https://github.com/jmacd/opentelemetry-go/pull/7) | ||
1. [Prometheus exporter](https://github.com/open-telemetry/opentelemetry-go/pull/296) | ||
|
||
## Meter implementation | ||
|
||
The Meter API provides methods to create metric instruments, metric | ||
instrument handles, and label sets. This document describes the | ||
jmacd marked this conversation as resolved.
Show resolved
Hide resolved
|
||
standard Meter implementation and supporting packages used to build | ||
a complete metric export pipeline. | ||
|
||
The Meter implementation lies at the bottom of the export pipeline, | ||
jmacd marked this conversation as resolved.
Show resolved
Hide resolved
|
||
where it's primary job is to maintain active state about pending | ||
metric updates. The most important requirement placed on the Meter | ||
implementation is that be able to "forget" state about metric updates | ||
after they are collected. | ||
|
||
The Meter implementation supports all three metric [calling | ||
conventions](api-metrics-user.md): handle-oriented calls, direct | ||
jmacd marked this conversation as resolved.
Show resolved
Hide resolved
jmacd marked this conversation as resolved.
Show resolved
Hide resolved
|
||
calls, and RecordBatch calls. Of these three calling conventions, | ||
direct calls and RecordBatch calls can be easily converted into | ||
handle-oriented calls using short-lived handles. For example, a | ||
direct call is implemented by acquiring a handle, operating on the | ||
jmacd marked this conversation as resolved.
Show resolved
Hide resolved
|
||
handle, and immediately releasing the handle. | ||
|
||
```golang | ||
// RecordOne converts a direct call into a handle-oriented call by allocating | ||
// a short-lived handle. | ||
func (inst *instrument) RecordOne(ctx context.Context, number core.Number, labelSet api.LabelSet) { | ||
h := inst.AcquireHandle(labelSet) | ||
defer h.Release() | ||
h.RecordOne(ctx, number) | ||
} | ||
``` | ||
|
||
The Meter implementation tracks an internal set of records, where | ||
every record either: (1) has a current, un-released handle pinning it | ||
in memory, (2) has pending updates that have not been collected. The | ||
Meter maintains a mapping from the pair (Instrument, LabelSet) to an | ||
active record. Each active record contains an Aggregator | ||
implementation, which is responsible for incorporating a series of | ||
updates into the current state. | ||
jmacd marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Note that storing a map of (Instrument, LabelSet) implies that the | ||
Meter implementation is not directly involved in dimensionality | ||
jmacd marked this conversation as resolved.
Show resolved
Hide resolved
|
||
reduction. Active records are maintained according to the complete | ||
set of labels in the LabelSet. If the export pipeline will reduce | ||
dimensionality of the data, that will occur in the collection pass. | ||
The Meter implementation does not reduce dimensionality "up front". | ||
|
||
The Meter implementation SHOULD ensure that operations on instrument | ||
handles be fast, bypassing the map lookup described above. Metric | ||
updates made via an instrument handle, where the aggregator is defined | ||
by simple atomic operations, should follow a very short code path. | ||
|
||
Because of short-lived handles, the SDK may accumulate records that | ||
are not associated with a user-held handle. After these records are | ||
collected they may be removed from the (Instrument, LabelSet) map of | ||
active records. Meter implementations MUST ensure that there are no | ||
lost updates as a result of clearing entries from the map. | ||
|
||
The Meter implementation provides a `Collect()` method to initiate | ||
collection, which MUST prevent concurrent collection. During the | ||
jmacd marked this conversation as resolved.
Show resolved
Hide resolved
|
||
collection pass, the Meter implementation checkpoints each active | ||
jmacd marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Aggregator and passes it to the Batcher for processing. | ||
|
||
This document does not specify how to coordinate synchronization | ||
between user-facing metric updates and metric collection activity, | ||
however Meter implementations SHOULD make efforts to avoid lock | ||
contention by holding locks only briefly or using lock-free | ||
techniques. | ||
|
||
## Aggregator implementations | ||
|
||
The Aggregator interface supports comnbining multiple metric events | ||
jmacd marked this conversation as resolved.
Show resolved
Hide resolved
|
||
into a single aggregated state. Different concrete Aggregator types | ||
provide different functionality and levels of concurrent performance. | ||
|
||
Aggregators support `Update()`, `Checkpoint()`, `Merge()`, and | ||
`Clone()` operations. `Update()` is called directly from the Meter in | ||
response to a metric event, and may be called concurrently. | ||
`Update()` is also passed the user's telemetry context, which allows | ||
is to access the current trace context and distributed correlations, | ||
honwever none of the built-in aggregators use this information. | ||
|
||
The `Checkpoint()`, `Merge()`, and `Clone()` operations are called in | ||
the collection code path to (atomically) save the current aggregator | ||
state, to combine two aggregator states, and to produce a copy for | ||
maintaining state outside of the Meter implementation. | ||
jmacd marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
The Metric SDK comes with six built-in Aggregator types, two of which | ||
are standard for use with counters and gauges. | ||
|
||
1. Counter: This aggregator maintains a Sum using only a single word of memory. | ||
jmacd marked this conversation as resolved.
Show resolved
Hide resolved
|
||
1. Gauge: This aggregator maintains a pair containing the last value and its timestamp. | ||
tigrannajaryan marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Four aggregators are intended for use with measure metrics. | ||
jmacd marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
1. MinMaxSumCount: This aggregator computes the min, max, sum, and count using only four words of memory. | ||
1. Sketch: This aggregator computes an approximate data structure that can estimate quantiles. Example algorithms include GK-Sketch, Q-Digest, T-Digest, DDSketch, and HDR-Histogram. The choice of algorithm should be made based on available libraries in each language. | ||
1. Histogram: This aggregator computes a histogram with pre-determined boundaries. This may be used to estimate quantiles, but is generally intended for cases where a histogram will be exported directly. | ||
1. Exact: This aggregator computes an array of all values, supporting exact quantile computations in the exporter. | ||
|
||
## Batcher implementation | ||
|
||
The Batcher acts as the primary source of configuration for exporting | ||
metrics from the SDK. The two kinds of configuration are: | ||
|
||
1. Given a metric instrument, choose which concrete Aggregator type to apply for in-process aggregation. | ||
1. Given a metric instrument, choose which dimensions to export by (i.e., the "grouping" function). | ||
|
||
The first choice--which concrete Aggregator type to apply--is made | ||
whenever the Meter implementation encounters a new (Instrument, | ||
LabelSet) pair. Each concrete type of Aggregator will perform a | ||
different function. Aggregators for counter and gauge instruments are | ||
relatively straightforward, but many concrete Aggregators are possible | ||
for measure metric instruments. The Batcher has an opportunity to | ||
disable instruments at this point simply by returning a `nil` | ||
Aggregator. | ||
|
||
The second choice--which dimensions to export by--affects how the | ||
batcher processes records emitted by the Meter implementation during | ||
collection. During collection Meter implementation emits an Export | ||
Record for each metric instrument with pending updates to the Batcher. | ||
|
||
The Export Record consists of a Descriptor (a description of the | ||
instrument), a LabelSet (the set of labels), and a checkpointed | ||
jmacd marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Aggregator. The checkpointed Aggregator passed from the Meter | ||
implementation to the Batcher contains a delta summarizing all events | ||
that happened since the prior collection pass. | ||
|
||
During the collection pass, the Batcher receives a full set of | ||
checkpointed Aggregators corresponding to each (Instrument, LabelSet) | ||
pair with an active record managed by the Meter implementation. | ||
According to its own configuration, the Batcher at this point | ||
determines which dimensions to aggregate for export; it computes a | ||
checkpoint of (possibly) reduced-dimension Export Records ready for | ||
export. | ||
|
||
Batcher implementations support the option of being stateless or | ||
stateful. Stateless Batchers compute checkpoints which describe the | ||
updates of a single collection period (i.e., deltas). Stateful | ||
Batchers compute checkpoints from over the process lifetime; these may | ||
be useful for simple exporters but are prone to consuming a large and | ||
ever-growing amount of memory, depending on LabelSet cardinality. | ||
|
||
Two standard Batcher implementations are provided. | ||
|
||
1. The "defaultkeys" Batcher reduces the export dimensions of each | ||
metric instrument to the Recommended keys declared with the | ||
instrument. | ||
1. The "ungrouped" Batcher exports metric instruments at full | ||
dimensionality; each LabelSet is exported without reducing dimensions. | ||
|
||
## Controller implementation | ||
|
||
A controller is needed to coordinate the decision to begin collection. | ||
Controllers generally are responsible for binding the Meter | ||
implementation, the Batcher, and the Exporter. | ||
|
||
Once the decision has been made, the controller's job is to call | ||
`Collect()` on the Meter implementation, then read the checkpoint from | ||
the Batcher, then invoke the Exporter. | ||
|
||
One standard "push" controller is provided, which triggers collection | ||
using a fixed period. The controller is responsible for flushing | ||
metric events prior to shutting down the process. | ||
|
||
Metric exporters that wish to pull metric updates are likely to | ||
integrate a controller directly into the exporter itself. | ||
|
||
## Exporter implementations | ||
|
||
The exporter is called with a checkpoint of finished Export Records. | ||
Most configuration decisions have been made before the exporter is | ||
invoked, including which instruments are enabled, which concrete | ||
aggregator types to use, and which dimensions to aggegate by. | ||
|
||
There is very little left for the exporter to do other than format the | ||
metric updates into the desired format and send them on their way. | ||
tigrannajaryan marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## Multiple exporter support | ||
|
||
The metric export pipeline specified here does not include explicit | ||
support for multiple export pipelines. In principle, any one of the | ||
interfaces here could be satisfied by a multiplexing implementation, | ||
but in practice, it will be costly to run multiple Batchers or | ||
Aggregators in parallel. | ||
|
||
If multiple exporters are required, therefore, it is best if they can | ||
share a single Batcher configuration. |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we did not make any commitment to support a vendor property protocol in OpenTelemetry. We should probably consider to remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also already have Stackdriver in the go OTel tracing directories... iirc the decision was that it was okay for now but would need to migrate out of otel before 1.0?
I personally would lobby for sooner, but there are examples of proprietary exporters already in there...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I anticipated this sort of remark, and wrote the following in the original PR:
open-telemetry/opentelemetry-go#326 (comment)
If an exporter were produced that called a vendor's library (or used a vendor's types) directly, I would not try to put this in the otel repo, but do see the dogstats variation on statsd as being widely supported outside of datadog (e.g., in Veneur).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will say that I'm only weakly committed to keeping this code in the otel repo. I'm aware of one effort in Go to create a direct export using the Datadog-Go client library, that's certainly not welcome in the otel repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(FYI @jbarciauskas) My position is that DataDog ought to publish a specification and declare Dogstatsd an open protocol, which would let us resolve this question.