-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow to store rates instead of cumulative counters when using Prometheus #15141
Conversation
893fc6d
to
00d1481
Compare
…heus Cumulative counters from Prometheus introduce some challenges when handled in Elasticsearch. For instace: * Aggregations become difficult, as derivative needs to be calculated first, then the actual aggregation. All this taking the different time series into account when grouping. * Rollups don't support this kind of construct. This PR introduces a new `rate_counters` parameter to the Prometheus collector. It will make the metricset store rates calculated between fetches for all fields that contain a cumulative counter. This includes: * metrics of type Counter * sum and count fields from Summaries and Histograms * Histogram buckets
00d1481
to
6996e93
Compare
Co-Authored-By: Chris Mark <[email protected]>
@@ -59,7 +62,7 @@ func getPromEventsFromMetricFamily(mf *dto.MetricFamily) []PromEvent { | |||
if !math.IsNaN(counter.GetValue()) && !math.IsInf(counter.GetValue(), 0) { | |||
events = append(events, PromEvent{ | |||
data: common.MapStr{ | |||
name: counter.GetValue(), | |||
name: rateFloat64(ccache, name, labels, counter.GetValue()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this is only returning rate if ccache is not nil.
while I think the code is understandable, in my opinionated self I would have a check after the if
above that does
value = counter.GetValue()
if ccache != nil {
value = rateFloat64(ccache, name, labels, value)
}
to make clear that we are not rating always.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to avoid that as this is done for many values in this block. That said I agree this can be misleading. What about renaming rateFloat64
to something like getCounterOrRate
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or RateIfEnabled 😇
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed after continuing reading.
The name change is ok from my POV, but not very needed. I think the code is very self-explanatory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code-wise it looks good to me and I see it as a quick win in the metrics explorer. I only have some concerns regarding the approach.
In general I prefer to store cumulative values, I think that they keep "more information", and from them you can later calculate rates or whatever you need. Having fixed rates allows less calculations (though usually they are directly the values that you more commonly need).
Actually in the Prometheus screenshot it can be seen that the rate is calculated in query time (at least rate
appears in the promql expression).
Also having the rate as something optional can lead to misleading results, imagine that a set of machines have different configuration for rate_counters
that another set of machines, or different modules (or light modules) have different configurations. In general the problem would be that we would be storing values of different nature in the same fields.
The solution for that would be to store rates on different fields, so you know when you are operating with rates and when with raw values, but I guess that for the Prometheus case we want to keep the original names.
So, should we consider storing the rates in different fields? And if so, could we store both the rates and the raw cumulative values when rate_counters
is true?
@exekias also remember to add a changelog entry too when this is ready. |
hey @jsoriano so far we have been picking most representative cumulative values and showing derivatives on them. That is pretty informative, but not expected percentiles. When using this rate counters on multiple instances configured with different periods, I guess that elasticsearch will use buckets to "normalize" data on a time window. Yes, there are still issues if that time window is smaller than the largest period configured at one of the metricsets, that instance would be contributing their rates only at some kibana buckets ... but that's also an issue today, by design all metricsets should push data at least once per visualization bucket time window. Is there a case where both cumulative and rates are needed for storing side by side? |
Thanks for the comments everyone! I thought about this as a quick win, but you raised some good concerns. In general I prefer when we are opinionated instead of allowing different behaviors depending on a config parameter.
I'm going to explore this idea before moving forward. Will come back with updates! |
Thanks @odacremolbap.
I am not so worried about different periods, but about different values for Talking about the time windows, I also find more intuitive to work with cummulative values for variable windows. Imagine for example a counter of requests, with a graph with a time window of a week, with points every some hours, the operations for the cummulative value is the same as for any other time window, sum/avg and derivative. But for pre-calculated rates you have to somehow sum all the buckets you have for each point in the graph, and any kind of downsampling will miss requests. |
I'm finally back on this, this would be my new proposal (also taking #14843 into account): Extend the existing mapping to add these dynamic fields:
Summaries would be stored using the rest of existing types (gauge and counter). These mappings don't collide with the existing ones, which allows both old and new methos to coexist for a while. We can allow to use these with a feature map, then make it the default by 8.0. WDYT? |
Ok, I think we can go on with this proposal, as it is behind a feature flag. I like we keep the original counter values apart of the rates 👍 |
closing in favor of #17061, which implements this in the agreed way |
Cumulative counters from Prometheus introduce some challenges when
handled in Elasticsearch. Aggregations become difficult, as derivative needs to be
calculated first, then the actual aggregation. All this taking the
different time series into account when grouping.
This PR introduces a new
rate_counters
parameter to the Prometheuscollector. It will make the metricset store rates calculated between
fetches for all fields that contain a cumulative counter. This includes:
This PR is part of #14843