-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
data model proto #126
data model proto #126
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's probably more, but here's what I spotted that need to be made consistent with the text format
proto/openmetrics_data_model.proto
Outdated
// Required. | ||
int64 seconds = 1; // Seconds since the epoch. Negative values are permitted | ||
|
||
uint32 nanoseconds = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the semantics of this when seconds is negative?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nanoseconds_offset
perhaps is a better concept? And if its signed instead, perhaps that's more permissive and allows you to simply always just blindly add these two together:
unix_nanos = (timestamp.seconds * SECOND_NANOSECONDS) + timestamp.nanoseconds_offset
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't quite understand. We can add even when nanoseconds is unsigned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed we'll just use the google protobuf timestamp type here.
message Point { | ||
oneof value { | ||
double double_value = 1; | ||
int64 int_value = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, did we agree uint64 here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be a little strange that double value can be negative but int value cannot be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to restrict it to unsigned? The proto doc has int64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason this cannot be negative for a gauge?
proto/openmetrics_data_model.proto
Outdated
double sum = 1; | ||
|
||
// Required. | ||
int64 count = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uint64?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
proto/openmetrics_data_model.proto
Outdated
message BucketCount { | ||
// Required. | ||
// Count is the number of values for a bucket of the histogram. | ||
int64 count = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uint64?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
proto/openmetrics_data_model.proto
Outdated
repeated Quantile quantile = 3; | ||
message Quantile { | ||
// Required. | ||
// Must be in the interval (0.0, 1.0]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be fully closed, why allow 1 but not 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how do you calculate the 0th percentile from a sample of values? i don't think subtracting delta from the lowest value in the sample is mathematically sound.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question for 100th percentile. It's either both or neither.
proto/openmetrics_data_model.proto
Outdated
// A bucket has a 0 lower bound and an inclusive upper bound for the | ||
// values that are counted for that bucket. That is, buckets are | ||
// cumulative. The upper bound of successive buckets must be increasing. | ||
// There is an implicit overflow bucket that extends up to +infinity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and must match the value of _count
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are just the bucket boundaries. I've added a comment later about the sum of bucket counts.
|
||
// Value for CUMULATIVE_HISTOGRAM or GAUGE_HISTOGRAM point. | ||
message HistogramValue { | ||
// Required. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neither of these should be required
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some systems don't have them (MySQL)
proto/openmetrics_data_model.proto
Outdated
double sum = 1; | ||
int64 count = 2; | ||
|
||
repeated Quantile quantile = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be clear that this is optional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you saying the quantile's are optional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
proto/openmetrics_data_model.proto
Outdated
InfoValue info_value = 5; | ||
SummaryValue summary_value = 6; | ||
} | ||
// Required for COUNTER, SUMMARY or CUMULATIVE_HISTOGRAM type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had agreed this is required from a model perspective.
Are you saying optional just for backward compatibility with the old text format? The proto model does not deal with that compatibility. It defines the baseline OpenMetrics model and a client that does not send start_timestamp is not OpenMetrics compliant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we had not agreed this. It's optional, as many many systems out there do not support this.
proto/openmetrics_data_model.proto
Outdated
// interval. | ||
Timestamp start_timestamp = 7; | ||
|
||
// If not specified, the timestamp will be decided by the backend. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make clear this is optional, and discouraged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to resolve the pull/push debate before we say "discouraged". IMO the same data model suffices for push in which case timestamp should be encouraged.
For now I am fine with saying "discouraged for pull and encouraged for push".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't say encouraged for push, it depends on what's on the other side. The other side can always add its own.
proto/openmetrics_data_model.proto
Outdated
|
||
// TODO: Format? | ||
// STATE_SETs and INFO types must not have a unit. | ||
string unit = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this perhaps instead be an enum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's too many potential values, it's a string to allow for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want any restriction on this string, or just UTF-8?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd have to be the same limits as metric names.
proto/openmetrics_data_model.proto
Outdated
message Linear { | ||
// Required. | ||
// Must be greater than 0. | ||
int32 num_explicit_buckets = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uint32 perhaps since there's no such thing as negative buckets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
proto/openmetrics_data_model.proto
Outdated
message Exponential { | ||
// Required. | ||
// Must be greater than 0. | ||
int32 num_exponential_buckets = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uint32 perhaps here too since there's no such thing as negative buckets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
proto/openmetrics_data_model.proto
Outdated
// The quantiles can be reset at arbitrary unknown times. | ||
|
||
double sum = 1; | ||
int64 count = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uint64 since there can't be negative sampled values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
proto/openmetrics_data_model.proto
Outdated
string value = 2; | ||
} | ||
|
||
message Timestamp { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not using the default timestamp from proto3? "google/protobuf/timestamp.proto"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be better to keep self-contained
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least we can copy the definition from there and avoid any arguments about nanoseconds being int vs uint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you know why it is int32 in google/protobuf/timestamp.proto even though it says "Non-negative fractions of a second at nanosecond resolution."?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two reasons that I know:
- This needs to be compatible with languages that do not have unsigned (like Java).
- It makes diff calculation easier (you do the diff initially then normalize the nanos).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think either is a problem in practice, it's not going to have values where that'll be an issue.
proto/openmetrics_data_model.proto
Outdated
|
||
// One or more timeseries for a single metric, where each timeseries has | ||
// one or more points. | ||
message MetricPoints { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just Metric?
message Point { | ||
oneof value { | ||
double double_value = 1; | ||
int64 int_value = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason this cannot be negative for a gauge?
// (and indicated by comments). | ||
|
||
// The top-level message sent on the wire. | ||
message MetricSet { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably a better name is MetricList. Set semantics are a bit unclear here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a set, you aren't allowed have duplicates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then probably document that, to make it clear why is a set and what is the "hash" function.
proto/openmetrics_data_model.proto
Outdated
|
||
enum Type { | ||
UNKNOWN = 0; // double or int valued point | ||
GAUGE = 1; // double or int valued point |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to avoid confusion and have backends to support both double/int for the same metric what about making the type include the type of the valued point. Something like GAUGE_DOUBLE?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this at length, this is the way we're going.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean the current state or what I proposed?
proto/openmetrics_data_model.proto
Outdated
} | ||
|
||
// A single timeseries. | ||
message Timeseries { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see anywhere something like MonitoredResource
in Stackdriver or __meta__
labels (in Prometheus, hope I used the correct name). Is this intentional? How are these reported?
Couple of use-cases where these are necessary and cannot be determine even if this data model is focused on a pull mechanism which would allow the backend to annotate these meta labels in general:
- Task (metric producer) -> Proxy -> Metric Backend. In this case the meta labels should be included in the metric because the backend does not know about the metric producer and cannot associate the right meta labels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what Info metrics are for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, but I have some problems understanding some of the things here, maybe worth some clarification:
- I don't understand how does the correlation between Info metrics and the other metrics happen. Should we have one info metric per MetricSet? Can I have in the same MetricSet metrics from multiple producers (in case of a proxy if I monitor multiple tasks)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are allowing multiple info metrics, e.g. build_info. It is up to the receiver to decide what to do with them. But they will all apply to the whole MetricSet, so one shouldn't mix multiple producers in the same MetricSet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there's no limit on how many info metrics you can have.
// timeseries, value of INFO metrics, and exemplars in Histograms. | ||
message Label { | ||
// Required. | ||
string name = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to support a label description? Human readable description of this metric?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this previously, and no.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
roger that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment should be marked as resolved.
proto/openmetrics_data_model.proto
Outdated
|
||
// Additional information about the example value // (e.g. trace id). | ||
// The sum of lengths of all the strings in all the labels must not | ||
// exceed 64 UTF-8 characters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
64 characters for all the labels here seems unreasonable. trace-id has 32-hex characters and span-id has 16-hex characters only these values will represent almost the entire limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is some concern about allowing this to be a dumping ground for stuff.
With UTF-8, the current limit lets someone use 64*4 = 256 bytes.
How about we state the limit in the sum of bytes needed for names and values so that it doesn't penalize users who are restricting to ascii.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The spec in general is UTF-8, I'd rather not switch to bytes here.
It sounds to me like you've 12 characters left over for the label name, so you're inside the limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sumeer I am not suggesting to not have a limit, but the limit is too small.
@brian-brazil correct but if I use "trace-id" and "span-id" (32 + 8 + 16 + 7 = 63) as keys then I have one character left so no other exemplar. Also we would like to have a trace-options which will suggest if the trace is sample or not (2-hex characters).
I know that the solution can be to modify the w3c standard (impossible) or to use "shorter" keys like "tid" and "sid" and "to" but those are not that clear. I really think that the limit should be more realistic and we can go with 256 characters for this.
|
||
// A name-value pair. These are used in multiple places: identifying | ||
// timeseries, value of INFO metrics, and exemplars in Histograms. | ||
message Label { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because label is key, value you can use Labels as map<String,String>.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment should be ignored and marked as resolved.
bool enabled = 1; | ||
|
||
// Required. | ||
string name = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this not a regular label? Recommended key can be state_name
the label value is the state
then this becomes a single boolean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then it'd not be a first-class type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine not having this as a first class. Probably on the gauge supporting boolean is enough
proto/openmetrics_data_model.proto
Outdated
// exponential sequence, or each bucket can be specified explicitly. | ||
// `BucketOptions` does not include the number of values in each bucket. | ||
// | ||
// A bucket has a 0 lower bound and an inclusive upper bound for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should say "inclusive 0 lower bound" for clarity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Buckets for values <0 don't seem possible with this data model. Is that intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per our discussions it is, noone could come up with a realistic use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My usual tech use-case is Spam Assassin scores.
However, OpenMetrics is supposed to be generally applicable. Observing negative values is a very normal thing for histograms in general. It is almost zero cost to support it. Kicking the support out would be a serious blow to the applicability of the format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current handling of negative bounds by Prometheus is defined here: https://prometheus.io/docs/prometheus/latest/querying/functions/#histogram_quantile
…ents and add explicit advice (#142) * Use google.protobuf.Timestamp for timestamps and add clarity to certain comments * Address that unknown, gauge and counter should all use double point values * Add optional/required to fields missing it * Mark summary quantiles as optional * Make linear buckets able to use negative observations * Call out that points are optional * Remove mention of 0 lower bound to allow negative buckets
* Update protobuf types to match latest OpenMetrics RFC data model
|
||
// A name-value pair. These are used in multiple places: identifying | ||
// timeseries, value of INFO metrics, and exemplars in Histograms. | ||
message Label { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment should be ignored and marked as resolved.
proto/openmetrics_data_model.proto
Outdated
// Optional. | ||
// If there is more than one point, all must have a timestamp field | ||
// and they must be in increasing order of timestamp. | ||
// If there is only one point and has no timestamp field, the receiver | ||
// will assign a timestamp. | ||
repeated Point points = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The amount of cases you have multiple points vs one point for the same labels is extremely small, I would suggest to optimize for the case where only one point per labels is sent and remove repeated (avoids an allocation or two depends on the languages) with the downside of repeating the labels for the case where multiple points are needed.
// timeseries, value of INFO metrics, and exemplars in Histograms. | ||
message Label { | ||
// Required. | ||
string name = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment should be marked as resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, after merging latest changes now merging this to master
Noice
…On Thu, Nov 5, 2020, 17:13 Rob Skillington ***@***.***> wrote:
Merged #126 <#126>
into master.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#126 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFYII7Z6WP2KINK2CSP553SOLFLBANCNFSM4GQGH5TA>
.
|
No description provided.