Optimize LabelValues in metric proto #36

tigrannajaryan · 2019-10-31T23:21:27Z

Optimize LabelValues in metric proto

LabelValues is now a string array.

Benchmarking results are below (Baseline is current master, Proposed is after this commit).
This change reduces CPU usage up to 40% for one-data-point timeseries encoding and reduces
memory consumption by about 25%.

===== Encoded sizes
Encoding                       Uncompressed  Improved        Compressed  Improved
Baseline/Metric/Histogram       13569 bytes  [1.000], gziped  774 bytes  [1.000]
Proposed/Metric/Histogram       13159 bytes  [1.031], gziped  781 bytes  [0.991]

Encoding                       Uncompressed  Improved        Compressed  Improved
Baseline/Metric/MixOne          48530 bytes  [1.000], gziped 1671 bytes  [1.000]
Proposed/Metric/MixOne          45720 bytes  [1.061], gziped 1677 bytes  [0.996]

Encoding                       Uncompressed  Improved        Compressed  Improved
Baseline/Metric/MixSeries       97867 bytes  [1.000], gziped 6620 bytes  [1.000]
Proposed/Metric/MixSeries       95067 bytes  [1.029], gziped 6587 bytes  [1.005]

goos: darwin
goarch: amd64
pkg: github.com/tigrannajaryan/exp-otelproto/encodings
BenchmarkEncode/Baseline/Metric/Int64-8        	      34	 156113692 ns/op
BenchmarkEncode/Proposed/Metric/Int64-8        	      67	  91936291 ns/op

BenchmarkEncode/Baseline/Metric/Summary-8      	     120	  50891750 ns/op
BenchmarkEncode/Proposed/Metric/Summary-8      	     141	  42172658 ns/op

BenchmarkEncode/Baseline/Metric/Histogram-8    	      91	  64816532 ns/op
BenchmarkEncode/Proposed/Metric/Histogram-8    	     100	  56837230 ns/op

BenchmarkEncode/Baseline/Metric/HistogramSeries-8         	      37	 159643499 ns/op
BenchmarkEncode/Proposed/Metric/HistogramSeries-8         	      40	 149983727 ns/op

BenchmarkEncode/Baseline/Metric/Mix-8                     	      20	 287582195 ns/op
BenchmarkEncode/Proposed/Metric/Mix-8                     	      31	 196378683 ns/op

BenchmarkEncode/Baseline/Metric/MixSeries-8               	       8	 681122815 ns/op
BenchmarkEncode/Proposed/Metric/MixSeries-8               	      10	 540018386 ns/op

BenchmarkDecode/Baseline/Metric/Int64-8                   	      18	 319038450 ns/op	206696040 B/op	 5724000 allocs/op
BenchmarkDecode/Proposed/Metric/Int64-8                   	      22	 265034276 ns/op	154696039 B/op	 4724000 allocs/op

BenchmarkDecode/Baseline/Metric/Summary-8                 	      52	 117972148 ns/op	79496034 B/op	 2024000 allocs/op
BenchmarkDecode/Proposed/Metric/Summary-8                 	      60	 104489249 ns/op	69096035 B/op	 1824000 allocs/op

BenchmarkDecode/Baseline/Metric/Histogram-8               	      36	 161440606 ns/op	104296028 B/op	 2624000 allocs/op
BenchmarkDecode/Proposed/Metric/Histogram-8               	      40	 147001580 ns/op	93896032 B/op	 2424000 allocs/op

BenchmarkDecode/Baseline/Metric/HistogramSeries-8         	      16	 345726946 ns/op	233896053 B/op	 5324000 allocs/op
BenchmarkDecode/Proposed/Metric/HistogramSeries-8         	      16	 333473715 ns/op	223496045 B/op	 5124000 allocs/op

BenchmarkDecode/Baseline/Metric/Mix-8                     	       9	 611361046 ns/op	391240035 B/op	10326000 allocs/op
BenchmarkDecode/Proposed/Metric/Mix-8                     	      10	 531026674 ns/op	318440033 B/op	 8926000 allocs/op

BenchmarkDecode/Baseline/Metric/MixSeries-8               	       5	1179081837 ns/op	776840057 B/op	18026000 allocs/op
BenchmarkDecode/Proposed/Metric/MixSeries-8               	       5	1077597483 ns/op	704040035 B/op	16626000 allocs/op

opentelemetry/proto/metrics/v1/metrics.proto

jmacd · 2019-11-01T15:04:25Z

I think it's important to have an explicit unspecified value.

In our metrics summit in late august we spent a lot of time discussing unspecified label values as they relate to metric handles. This was before LabelSets were introduced, but I see these as equivalent. The question boiled down to this -- should the SDK be required to inspect the labels from the distributed context any time a LabelSet/Handle is used that did not specify a value. We declared that no, the value is explicitly unspecified if the LabelSet/Handle did not declare it.

I added caveats that (a) users can explicitly inject distributed context labels if they want, (b) SDKs could decide to do something automatic about distributed context labels.

SergeyKanzhelev · 2019-11-01T18:50:09Z

We declared that no, the value is explicitly unspecified if the LabelSet/Handle did not declare it.

From consumption standpoint - what's the difference of non-specified and empty value? Would one want to create a stacked chart where unspecified label and empty label are two different values? If empty is that important for this scenario - perhaps one can report it with the actual word "EMPTY"?

If there are some notes I can read thru to understand it - please let me know

jmacd · 2019-11-01T19:03:36Z

If we specified that empty strings were invalid values, then we could get away with using the empty string to encode an unspecified value. I have mixed feelings here, because it's so much easier to use an empty string than to distinguish unspecified values. But then some users, who have legitimate empty string values will be forced to use "EMPTY", which is also a valid value, and then they're left with ambiguity.

tigrannajaryan · 2019-11-01T19:18:11Z

I feel empty/unspecified string is a discussion that likely needs to happen in the realm of Specification. If there is a decision not to support unspecified or empty strings we can then modify the proto accordingly. For now the proto and this PR mirrors what the Spec requires.

tigrannajaryan · 2019-11-06T15:26:28Z

@SergeyKanzhelev @jmacd can we move forward with this change? It does not introduce new concepts, but merely follows what spec requires today.

I created a separate issue to clarify the need for "unspecified" lablels here: open-telemetry/opentelemetry-specification#345

If the conclusion eventually is that we don't need the unspecified labels I will happily modify this proto to remove them in a future PR. For now I believe this PR is good as is.

jmacd · 2019-11-06T17:48:05Z

In hindsight, I'm not sure we should have written that into the specification. It was related to the concept of required aggregation keys, which was softened in the final spec to recommended aggregation keys. If a system is expecting precisely a fixed number of dimensions and those dimensions are not included in the export format, they are effectively unspecified.

The reason we discussed explicitly unspecified values was to declare that the implementation would not be required to scan through the context for distributed correlation labels to satisfy the missing values.

@bogdandrutu I'd be happy to strike this from the specification.

SergeyKanzhelev

I'm fine with this change, but declaring empty strings and unspecified values to be the same would be a better optimization and easier to understand

opentelemetry/proto/metrics/v1/metrics.proto

tigrannajaryan · 2019-11-06T17:58:25Z

I'm fine with this change, but declaring empty strings and unspecified values to be the same would be a better optimization and easier to understand

@SergeyKanzhelev I agree and if spec decides to do that I will make the corresponding change here.

jmacd · 2019-11-06T18:02:05Z

@SergeyKanzhelev I didn't say empty string--that I would reject. I said leaving out a key:value entirely (i.e., no key, no value) is effectively saying unspecified.

tigrannajaryan · 2019-11-06T18:09:08Z

@SergeyKanzhelev I didn't say empty string--that I would reject. I said leaving out a key:value entirely (i.e., no key, no value) is effectively saying unspecified.

Agree. That's exactly possible to do now, by omitting the key from MetricDescriptor.label_keys. The only downside to that is that when exporting metrics you will have to create distinct instances of MetricDescriptor for each combination of specified labels (as opposed to have a single MetricDescriptor for all instances of one metric that have different label values). It is doable and my gut feeling it is not a big deal since it is going to be rare.

jmacd · 2019-11-06T18:22:28Z

I could imagine a solution where instead of having label values be a repeated strings, it would be repeated integers referring to a dictionary of string values. This would support compression -- where the same string is used more than once, and it would allow us to have an explicit value like -1 to indicate unspecified.

tigrannajaryan · 2019-11-06T19:05:53Z

How common is it to have labels with repeated values?

Also compression-wise if the concern is about wire size the proto most likely is going to be gziped and that compresses repetitive strings quite efficiently. Performance-wise this elimination of redundant labels could probably be a win, but for performance I am looking into an approach that looks promising, so I'd leave that aside for a moment.

Compressing the way you suggested is possible for other entities as well but so far I avoided going in that direction because i think it makes working with data slightly more complicated since you need to use an additional indirection (the way you suggested). It may be worth exploring, I can do some experimentation later if time permits.

jmacd · 2019-11-06T19:08:16Z

I am starting to agree that we should allow empty strings. Let's do that and see what trouble ensues. Should we specify that the empty string is not a valid attribute/label value?

SergeyKanzhelev · 2019-11-06T19:29:19Z

@jmacd giving I'm not very deep into metrics - I like the idea to try equating empty strings with unspecified. If there will be legitimate use cases for an opposite - we can always add an indexes array with the indexes saying which empty strings are empty and which are unspecified. Similar to what's proposed in this PR.

bogdandrutu · 2019-11-07T14:35:00Z

Hard to follow this discussion. Can we summarize what is the current status? I personally prefer to first decide if we need to support unspecified or we can do what OpenMetrics does and treat unspecified as empty.

tigrannajaryan · 2019-11-07T18:43:35Z

Hard to follow this discussion. Can we summarize what is the current status? I personally prefer to first decide if we need to support unspecified or we can do what OpenMetrics does and treat unspecified as empty.

@bogdandrutu to summarize: I believe we can move forward and merge this PR since it is a straight improvement over current state and does not change any semantics, it is simply a performance optimization without functional changes.

We can separately discuss if we want to eliminate "unspecified label" concept as part of open-telemetry/opentelemetry-specification#345 that I created for that discussion.

Depending on the decision resulting from that discussion I will make any necessary changes in this proto repository.

SergeyKanzhelev · 2019-11-07T22:11:32Z

@tigrannajaryan I think there is a difference - unspecified values on API and unspecified values on schema (it is my take). It seems like there is little objection to have "empty mean unspecified" on schema. So this PR can be improved further

tigrannajaryan · 2019-11-08T00:32:45Z

@SergeyKanzhelev @jmacd I removed the ability to have unspecified value from this PR. Please have another look.

opentelemetry/proto/metrics/v1/metrics.proto

LabelValues is now a string array. Benchmarking results are below (Baseline is current `master`, Proposed is after this commit). This change reduces CPU usage up to 40% for one-data-point timeseries encoding and reduces memory consumption by about 25%. ``` ===== Encoded sizes Encoding Uncompressed Improved Compressed Improved Baseline/Metric/Histogram 13569 bytes [1.000], gziped 774 bytes [1.000] Proposed/Metric/Histogram 13159 bytes [1.031], gziped 781 bytes [0.991] Encoding Uncompressed Improved Compressed Improved Baseline/Metric/MixOne 48530 bytes [1.000], gziped 1671 bytes [1.000] Proposed/Metric/MixOne 45720 bytes [1.061], gziped 1677 bytes [0.996] Encoding Uncompressed Improved Compressed Improved Baseline/Metric/MixSeries 97867 bytes [1.000], gziped 6620 bytes [1.000] Proposed/Metric/MixSeries 95067 bytes [1.029], gziped 6587 bytes [1.005] goos: darwin goarch: amd64 pkg: github.com/tigrannajaryan/exp-otelproto/encodings BenchmarkEncode/Baseline/Metric/Int64-8 34 156113692 ns/op BenchmarkEncode/Proposed/Metric/Int64-8 67 91936291 ns/op BenchmarkEncode/Baseline/Metric/Summary-8 120 50891750 ns/op BenchmarkEncode/Proposed/Metric/Summary-8 141 42172658 ns/op BenchmarkEncode/Baseline/Metric/Histogram-8 91 64816532 ns/op BenchmarkEncode/Proposed/Metric/Histogram-8 100 56837230 ns/op BenchmarkEncode/Baseline/Metric/HistogramSeries-8 37 159643499 ns/op BenchmarkEncode/Proposed/Metric/HistogramSeries-8 40 149983727 ns/op BenchmarkEncode/Baseline/Metric/Mix-8 20 287582195 ns/op BenchmarkEncode/Proposed/Metric/Mix-8 31 196378683 ns/op BenchmarkEncode/Baseline/Metric/MixSeries-8 8 681122815 ns/op BenchmarkEncode/Proposed/Metric/MixSeries-8 10 540018386 ns/op BenchmarkDecode/Baseline/Metric/Int64-8 18 319038450 ns/op 206696040 B/op 5724000 allocs/op BenchmarkDecode/Proposed/Metric/Int64-8 22 265034276 ns/op 154696039 B/op 4724000 allocs/op BenchmarkDecode/Baseline/Metric/Summary-8 52 117972148 ns/op 79496034 B/op 2024000 allocs/op BenchmarkDecode/Proposed/Metric/Summary-8 60 104489249 ns/op 69096035 B/op 1824000 allocs/op BenchmarkDecode/Baseline/Metric/Histogram-8 36 161440606 ns/op 104296028 B/op 2624000 allocs/op BenchmarkDecode/Proposed/Metric/Histogram-8 40 147001580 ns/op 93896032 B/op 2424000 allocs/op BenchmarkDecode/Baseline/Metric/HistogramSeries-8 16 345726946 ns/op 233896053 B/op 5324000 allocs/op BenchmarkDecode/Proposed/Metric/HistogramSeries-8 16 333473715 ns/op 223496045 B/op 5124000 allocs/op BenchmarkDecode/Baseline/Metric/Mix-8 9 611361046 ns/op 391240035 B/op 10326000 allocs/op BenchmarkDecode/Proposed/Metric/Mix-8 10 531026674 ns/op 318440033 B/op 8926000 allocs/op BenchmarkDecode/Baseline/Metric/MixSeries-8 5 1179081837 ns/op 776840057 B/op 18026000 allocs/op BenchmarkDecode/Proposed/Metric/MixSeries-8 5 1077597483 ns/op 704040035 B/op 16626000 allocs/op ```

SergeyKanzhelev · 2019-11-09T20:09:00Z

@jmacd I'm merging this just to move forward. If you feel that we need to revisit this decision - please file an issue.

jmacd · 2019-11-11T16:21:08Z

LGTM

tigrannajaryan requested review from AloisReitbauer, bogdandrutu, c24t, carlosalberto, iredelmeier, SergeyKanzhelev, songy23, tedsuo and yurishkuro as code owners October 31, 2019 23:21

SergeyKanzhelev reviewed Oct 31, 2019

View reviewed changes

opentelemetry/proto/metrics/v1/metrics.proto Show resolved Hide resolved

SergeyKanzhelev reviewed Oct 31, 2019

View reviewed changes

opentelemetry/proto/metrics/v1/metrics.proto Outdated Show resolved Hide resolved

tigrannajaryan force-pushed the feature/tigran/metric_labelvalues branch from beea350 to c27d863 Compare November 6, 2019 15:21

tigrannajaryan mentioned this pull request Nov 6, 2019

Clarify the need to have "unspecified" values of metric labels open-telemetry/opentelemetry-specification#345

Closed

SergeyKanzhelev approved these changes Nov 6, 2019

View reviewed changes

SergeyKanzhelev reviewed Nov 6, 2019

View reviewed changes

opentelemetry/proto/metrics/v1/metrics.proto Outdated Show resolved Hide resolved

This comment has been minimized.

Sign in to view

tigrannajaryan force-pushed the feature/tigran/metric_labelvalues branch from c27d863 to c5e7aef Compare November 8, 2019 00:31

bogdandrutu approved these changes Nov 8, 2019

View reviewed changes

opentelemetry/proto/metrics/v1/metrics.proto Outdated Show resolved Hide resolved

tigrannajaryan force-pushed the feature/tigran/metric_labelvalues branch from c5e7aef to 2463a6e Compare November 8, 2019 15:06

bogdandrutu approved these changes Nov 8, 2019

View reviewed changes

tigrannajaryan force-pushed the feature/tigran/metric_labelvalues branch from 2463a6e to 81003a1 Compare November 8, 2019 15:09

SergeyKanzhelev approved these changes Nov 8, 2019

View reviewed changes

tigrannajaryan force-pushed the feature/tigran/metric_labelvalues branch from 165c523 to c91401f Compare November 8, 2019 17:21

Merge branch 'master' into feature/tigran/metric_labelvalues

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23
Expired
Learn about vigilant mode

3767ccf

SergeyKanzhelev merged commit 5578102 into open-telemetry:master Nov 9, 2019

tigrannajaryan deleted the feature/tigran/metric_labelvalues branch November 11, 2019 16:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize LabelValues in metric proto #36

Optimize LabelValues in metric proto #36

tigrannajaryan commented Oct 31, 2019 •

edited

Loading

jmacd commented Nov 1, 2019

SergeyKanzhelev commented Nov 1, 2019

jmacd commented Nov 1, 2019

tigrannajaryan commented Nov 1, 2019

tigrannajaryan commented Nov 6, 2019

jmacd commented Nov 6, 2019

SergeyKanzhelev left a comment

tigrannajaryan commented Nov 6, 2019

jmacd commented Nov 6, 2019

tigrannajaryan commented Nov 6, 2019 •

edited

Loading

jmacd commented Nov 6, 2019

tigrannajaryan commented Nov 6, 2019

jmacd commented Nov 6, 2019

SergeyKanzhelev commented Nov 6, 2019

This comment has been minimized.

bogdandrutu commented Nov 7, 2019

tigrannajaryan commented Nov 7, 2019

SergeyKanzhelev commented Nov 7, 2019

tigrannajaryan commented Nov 8, 2019

SergeyKanzhelev commented Nov 9, 2019

jmacd commented Nov 11, 2019

Optimize LabelValues in metric proto #36

Optimize LabelValues in metric proto #36

Conversation

tigrannajaryan commented Oct 31, 2019 • edited Loading

jmacd commented Nov 1, 2019

SergeyKanzhelev commented Nov 1, 2019

jmacd commented Nov 1, 2019

tigrannajaryan commented Nov 1, 2019

tigrannajaryan commented Nov 6, 2019

jmacd commented Nov 6, 2019

SergeyKanzhelev left a comment

Choose a reason for hiding this comment

tigrannajaryan commented Nov 6, 2019

jmacd commented Nov 6, 2019

tigrannajaryan commented Nov 6, 2019 • edited Loading

jmacd commented Nov 6, 2019

tigrannajaryan commented Nov 6, 2019

jmacd commented Nov 6, 2019

SergeyKanzhelev commented Nov 6, 2019

This comment has been minimized.

bogdandrutu commented Nov 7, 2019

tigrannajaryan commented Nov 7, 2019

SergeyKanzhelev commented Nov 7, 2019

tigrannajaryan commented Nov 8, 2019

SergeyKanzhelev commented Nov 9, 2019

jmacd commented Nov 11, 2019

tigrannajaryan commented Oct 31, 2019 •

edited

Loading

tigrannajaryan commented Nov 6, 2019 •

edited

Loading