Allow to store rates instead of cumulative counters when using Prometheus #15141

exekias · 2019-12-17T13:04:53Z

Cumulative counters from Prometheus introduce some challenges when
handled in Elasticsearch. Aggregations become difficult, as derivative needs to be
calculated first, then the actual aggregation. All this taking the
different time series into account when grouping.

This PR introduces a new rate_counters parameter to the Prometheus
collector. It will make the metricset store rates calculated between
fetches for all fields that contain a cumulative counter. This includes:

metrics of type Counter
sum and count fields from Summaries and Histograms
Histogram buckets

This PR is part of #14843

…heus Cumulative counters from Prometheus introduce some challenges when handled in Elasticsearch. For instace: * Aggregations become difficult, as derivative needs to be calculated first, then the actual aggregation. All this taking the different time series into account when grouping. * Rollups don't support this kind of construct. This PR introduces a new `rate_counters` parameter to the Prometheus collector. It will make the metricset store rates calculated between fetches for all fields that contain a cumulative counter. This includes: * metrics of type Counter * sum and count fields from Summaries and Histograms * Histogram buckets

metricbeat/helper/prometheus/counter.go

Co-Authored-By: Chris Mark <[email protected]>

odacremolbap · 2019-12-18T10:40:10Z

metricbeat/module/prometheus/collector/data.go

@@ -59,7 +62,7 @@ func getPromEventsFromMetricFamily(mf *dto.MetricFamily) []PromEvent {
 			if !math.IsNaN(counter.GetValue()) && !math.IsInf(counter.GetValue(), 0) {
 				events = append(events, PromEvent{
 					data: common.MapStr{
-						name: counter.GetValue(),
+						name: rateFloat64(ccache, name, labels, counter.GetValue()),


nit: this is only returning rate if ccache is not nil.

while I think the code is understandable, in my opinionated self I would have a check after the if above that does

value = counter.GetValue() if ccache != nil { value = rateFloat64(ccache, name, labels, value) }

to make clear that we are not rating always.

I tried to avoid that as this is done for many values in this block. That said I agree this can be misleading. What about renaming rateFloat64 to something like getCounterOrRate?

or RateIfEnabled 😇

I noticed after continuing reading.

The name change is ok from my POV, but not very needed. I think the code is very self-explanatory.

jsoriano

Code-wise it looks good to me and I see it as a quick win in the metrics explorer. I only have some concerns regarding the approach.

In general I prefer to store cumulative values, I think that they keep "more information", and from them you can later calculate rates or whatever you need. Having fixed rates allows less calculations (though usually they are directly the values that you more commonly need).
Actually in the Prometheus screenshot it can be seen that the rate is calculated in query time (at least rate appears in the promql expression).

Also having the rate as something optional can lead to misleading results, imagine that a set of machines have different configuration for rate_counters that another set of machines, or different modules (or light modules) have different configurations. In general the problem would be that we would be storing values of different nature in the same fields.
The solution for that would be to store rates on different fields, so you know when you are operating with rates and when with raw values, but I guess that for the Prometheus case we want to keep the original names.

So, should we consider storing the rates in different fields? And if so, could we store both the rates and the raw cumulative values when rate_counters is true?

ChrsMark · 2019-12-18T11:16:32Z

@exekias also remember to add a changelog entry too when this is ready.

odacremolbap · 2019-12-18T11:25:15Z

So, should we consider storing the rates in different fields? And if so, could we store both the rates and the raw cumulative values when rate_counters is true?

hey @jsoriano
the way we store cumulative values at elasticsearch makes it hard to calculate rates to be represented at graphs. If such functionality lands in elastic, I guess cumulatives would be very welcomed.

so far we have been picking most representative cumulative values and showing derivatives on them. That is pretty informative, but not expected percentiles.

When using this rate counters on multiple instances configured with different periods, I guess that elasticsearch will use buckets to "normalize" data on a time window. Yes, there are still issues if that time window is smaller than the largest period configured at one of the metricsets, that instance would be contributing their rates only at some kibana buckets ... but that's also an issue today, by design all metricsets should push data at least once per visualization bucket time window.

Is there a case where both cumulative and rates are needed for storing side by side?
I guess there are as long as you want to keep cumulatives for doing your own calculations without metricbeat pre-cooking it (is anyone doing that?). Probably the code to allow that would be an add-on out of this PR, allowing to register the same metric twice with different field names, one rate the other cumulative.

exekias · 2019-12-18T11:42:22Z

Thanks for the comments everyone! I thought about this as a quick win, but you raised some good concerns. In general I prefer when we are opinionated instead of allowing different behaviors depending on a config parameter.

So, should we consider storing the rates in different fields? And if so, could we store both the rates and the raw cumulative values when rate_counters is true?

I'm going to explore this idea before moving forward. Will come back with updates!

jsoriano · 2019-12-18T11:50:39Z

Thanks @odacremolbap.

When using this rate counters on multiple instances configured with different periods, I guess that elasticsearch will use buckets to "normalize" data on a time window.

I am not so worried about different periods, but about different values for rate_counters, i.e. some machines and/or modules using rates, and others using the raw (cummulative) value. In both cases they would be stored in the same fields, but they should be aggregated in different ways given their different nature.

Talking about the time windows, I also find more intuitive to work with cummulative values for variable windows. Imagine for example a counter of requests, with a graph with a time window of a week, with points every some hours, the operations for the cummulative value is the same as for any other time window, sum/avg and derivative. But for pre-calculated rates you have to somehow sum all the buckets you have for each point in the graph, and any kind of downsampling will miss requests.

exekias · 2020-02-03T15:40:44Z

I'm finally back on this, this would be my new proposal (also taking #14843 into account):

Extend the existing mapping to add these dynamic fields:

prometheus.*.gauge (double) to store gauges
prometheus.*.histogram (histogram) to store histograms using the new Elasticsearch type
prometheus.*.counter (double) to store counters
(Optional) prometheus.*.rate (double) to store counter rates

Summaries would be stored using the rest of existing types (gauge and counter).

These mappings don't collide with the existing ones, which allows both old and new methos to coexist for a while. We can allow to use these with a feature map, then make it the default by 8.0.

WDYT?

jsoriano · 2020-02-04T17:43:14Z

Ok, I think we can go on with this proposal, as it is behind a feature flag. I like we keep the original counter values apart of the rates 👍

exekias · 2020-03-23T15:59:26Z

closing in favor of #17061, which implements this in the agreed way

exekias added enhancement review needs_backport PR is waiting to be backported to other branches. Team:Integrations Label for the Integrations team labels Dec 17, 2019

exekias requested a review from a team as a code owner December 17, 2019 13:04

exekias changed the title ~~Allow to store rates instead of cumulative counters when using Promet…~~ Allow to store rates instead of cumulative counters when using Prometheus Dec 17, 2019

exekias force-pushed the prometheus-rate-counters branch from 893fc6d to 00d1481 Compare December 17, 2019 13:58

exekias force-pushed the prometheus-rate-counters branch from 00d1481 to 6996e93 Compare December 17, 2019 14:13

ChrsMark reviewed Dec 18, 2019

View reviewed changes

metricbeat/helper/prometheus/counter.go Outdated Show resolved Hide resolved

Update metricbeat/helper/prometheus/counter.go

8928ea0

Co-Authored-By: Chris Mark <[email protected]>

odacremolbap reviewed Dec 18, 2019

View reviewed changes

jsoriano reviewed Dec 18, 2019

View reviewed changes

andresrc added the [zube]: Old-Inbox label Jan 27, 2020

exekias closed this Mar 23, 2020

zube bot added [zube]: Done and removed [zube]: Old-Inbox labels Mar 23, 2020

andresrc removed the [zube]: Done label Mar 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow to store rates instead of cumulative counters when using Prometheus #15141

Allow to store rates instead of cumulative counters when using Prometheus #15141

exekias commented Dec 17, 2019 •

edited

Loading

odacremolbap Dec 18, 2019

exekias Dec 18, 2019 •

edited

Loading

exekias Dec 18, 2019

odacremolbap Dec 18, 2019

jsoriano left a comment

ChrsMark commented Dec 18, 2019

odacremolbap commented Dec 18, 2019

exekias commented Dec 18, 2019

jsoriano commented Dec 18, 2019

exekias commented Feb 3, 2020

jsoriano commented Feb 4, 2020

exekias commented Mar 23, 2020

Allow to store rates instead of cumulative counters when using Prometheus #15141

Allow to store rates instead of cumulative counters when using Prometheus #15141

Conversation

exekias commented Dec 17, 2019 • edited Loading

odacremolbap Dec 18, 2019

Choose a reason for hiding this comment

exekias Dec 18, 2019 • edited Loading

Choose a reason for hiding this comment

exekias Dec 18, 2019

Choose a reason for hiding this comment

odacremolbap Dec 18, 2019

Choose a reason for hiding this comment

jsoriano left a comment

Choose a reason for hiding this comment

ChrsMark commented Dec 18, 2019

odacremolbap commented Dec 18, 2019

exekias commented Dec 18, 2019

jsoriano commented Dec 18, 2019

exekias commented Feb 3, 2020

jsoriano commented Feb 4, 2020

exekias commented Mar 23, 2020

exekias commented Dec 17, 2019 •

edited

Loading

exekias Dec 18, 2019 •

edited

Loading