Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to store rates instead of cumulative counters when using Prometheus #15141

Closed
wants to merge 2 commits into from

Conversation

exekias
Copy link
Contributor

@exekias exekias commented Dec 17, 2019

Cumulative counters from Prometheus introduce some challenges when
handled in Elasticsearch. Aggregations become difficult, as derivative needs to be
calculated first, then the actual aggregation. All this taking the
different time series into account when grouping.

This PR introduces a new rate_counters parameter to the Prometheus
collector. It will make the metricset store rates calculated between
fetches for all fields that contain a cumulative counter. This includes:

  • metrics of type Counter
  • sum and count fields from Summaries and Histograms
  • Histogram buckets

image
image

This PR is part of #14843

@exekias exekias added enhancement review needs_backport PR is waiting to be backported to other branches. Team:Integrations Label for the Integrations team labels Dec 17, 2019
@exekias exekias requested a review from a team as a code owner December 17, 2019 13:04
@exekias exekias changed the title Allow to store rates instead of cumulative counters when using Promet… Allow to store rates instead of cumulative counters when using Prometheus Dec 17, 2019
@exekias exekias force-pushed the prometheus-rate-counters branch from 893fc6d to 00d1481 Compare December 17, 2019 13:58
…heus

Cumulative counters from Prometheus introduce some challenges when
handled in Elasticsearch. For instace:

 * Aggregations become difficult, as derivative needs to be
 calculated first, then the actual aggregation. All this taking the
 different time series into account when grouping.
 * Rollups don't support this kind of construct.

This PR introduces a new `rate_counters` parameter to the Prometheus
collector. It will make the metricset store rates calculated between
fetches for all fields that contain a cumulative counter. This includes:

 * metrics of type Counter
 * sum and count fields from Summaries and Histograms
 * Histogram buckets
@exekias exekias force-pushed the prometheus-rate-counters branch from 00d1481 to 6996e93 Compare December 17, 2019 14:13
@@ -59,7 +62,7 @@ func getPromEventsFromMetricFamily(mf *dto.MetricFamily) []PromEvent {
if !math.IsNaN(counter.GetValue()) && !math.IsInf(counter.GetValue(), 0) {
events = append(events, PromEvent{
data: common.MapStr{
name: counter.GetValue(),
name: rateFloat64(ccache, name, labels, counter.GetValue()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is only returning rate if ccache is not nil.

while I think the code is understandable, in my opinionated self I would have a check after the if above that does

value = counter.GetValue()
if ccache != nil {
  value = rateFloat64(ccache, name, labels, value)
}

to make clear that we are not rating always.

Copy link
Contributor Author

@exekias exekias Dec 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to avoid that as this is done for many values in this block. That said I agree this can be misleading. What about renaming rateFloat64 to something like getCounterOrRate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or RateIfEnabled 😇

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed after continuing reading.

The name change is ok from my POV, but not very needed. I think the code is very self-explanatory.

Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code-wise it looks good to me and I see it as a quick win in the metrics explorer. I only have some concerns regarding the approach.

In general I prefer to store cumulative values, I think that they keep "more information", and from them you can later calculate rates or whatever you need. Having fixed rates allows less calculations (though usually they are directly the values that you more commonly need).
Actually in the Prometheus screenshot it can be seen that the rate is calculated in query time (at least rate appears in the promql expression).

Also having the rate as something optional can lead to misleading results, imagine that a set of machines have different configuration for rate_counters that another set of machines, or different modules (or light modules) have different configurations. In general the problem would be that we would be storing values of different nature in the same fields.
The solution for that would be to store rates on different fields, so you know when you are operating with rates and when with raw values, but I guess that for the Prometheus case we want to keep the original names.

So, should we consider storing the rates in different fields? And if so, could we store both the rates and the raw cumulative values when rate_counters is true?

@ChrsMark
Copy link
Member

@exekias also remember to add a changelog entry too when this is ready.

@odacremolbap
Copy link
Contributor

So, should we consider storing the rates in different fields? And if so, could we store both the rates and the raw cumulative values when rate_counters is true?

hey @jsoriano
the way we store cumulative values at elasticsearch makes it hard to calculate rates to be represented at graphs. If such functionality lands in elastic, I guess cumulatives would be very welcomed.

so far we have been picking most representative cumulative values and showing derivatives on them. That is pretty informative, but not expected percentiles.

When using this rate counters on multiple instances configured with different periods, I guess that elasticsearch will use buckets to "normalize" data on a time window. Yes, there are still issues if that time window is smaller than the largest period configured at one of the metricsets, that instance would be contributing their rates only at some kibana buckets ... but that's also an issue today, by design all metricsets should push data at least once per visualization bucket time window.

Is there a case where both cumulative and rates are needed for storing side by side?
I guess there are as long as you want to keep cumulatives for doing your own calculations without metricbeat pre-cooking it (is anyone doing that?). Probably the code to allow that would be an add-on out of this PR, allowing to register the same metric twice with different field names, one rate the other cumulative.

@exekias
Copy link
Contributor Author

exekias commented Dec 18, 2019

Thanks for the comments everyone! I thought about this as a quick win, but you raised some good concerns. In general I prefer when we are opinionated instead of allowing different behaviors depending on a config parameter.

So, should we consider storing the rates in different fields? And if so, could we store both the rates and the raw cumulative values when rate_counters is true?

I'm going to explore this idea before moving forward. Will come back with updates!

@jsoriano
Copy link
Member

Thanks @odacremolbap.

When using this rate counters on multiple instances configured with different periods, I guess that elasticsearch will use buckets to "normalize" data on a time window.

I am not so worried about different periods, but about different values for rate_counters, i.e. some machines and/or modules using rates, and others using the raw (cummulative) value. In both cases they would be stored in the same fields, but they should be aggregated in different ways given their different nature.

Talking about the time windows, I also find more intuitive to work with cummulative values for variable windows. Imagine for example a counter of requests, with a graph with a time window of a week, with points every some hours, the operations for the cummulative value is the same as for any other time window, sum/avg and derivative. But for pre-calculated rates you have to somehow sum all the buckets you have for each point in the graph, and any kind of downsampling will miss requests.

@exekias
Copy link
Contributor Author

exekias commented Feb 3, 2020

I'm finally back on this, this would be my new proposal (also taking #14843 into account):

Extend the existing mapping to add these dynamic fields:

  • prometheus.*.gauge (double) to store gauges
  • prometheus.*.histogram (histogram) to store histograms using the new Elasticsearch type
  • prometheus.*.counter (double) to store counters
  • (Optional) prometheus.*.rate (double) to store counter rates

Summaries would be stored using the rest of existing types (gauge and counter).

These mappings don't collide with the existing ones, which allows both old and new methos to coexist for a while. We can allow to use these with a feature map, then make it the default by 8.0.

WDYT?

@jsoriano
Copy link
Member

jsoriano commented Feb 4, 2020

Ok, I think we can go on with this proposal, as it is behind a feature flag. I like we keep the original counter values apart of the rates 👍

@exekias
Copy link
Contributor Author

exekias commented Mar 23, 2020

closing in favor of #17061, which implements this in the agreed way

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement needs_backport PR is waiting to be backported to other branches. review Team:Integrations Label for the Integrations team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants