Improve prometheus metric mapping to internal metrics / line protocol #4415

danielnelson · 2018-07-12T05:26:52Z

Updated April 5, 2019. Check the edited ^^ history for old revisions.

Proposal

Example Input:

# HELP http_requests_total The total number of HTTP requests.
# TYPE http_requests_total counter
http_requests_total{method="post",code="200"} 1027 1395066363000
http_requests_total{method="post",code="400"}    3 1395066363000

# A histogram, which has a pretty complex representation in the text format:
# HELP http_request_duration_seconds A histogram of the request duration.
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{le="0.05"} 24054
http_request_duration_seconds_bucket{le="0.1"} 33444
http_request_duration_seconds_bucket{le="0.2"} 100392
http_request_duration_seconds_bucket{le="0.5"} 129389
http_request_duration_seconds_bucket{le="1"} 133988
http_request_duration_seconds_bucket{le="+Inf"} 144320
http_request_duration_seconds_sum 53423
http_request_duration_seconds_count 144320

# Finally a summary, which has a complex representation, too:
# HELP rpc_duration_seconds A summary of the RPC duration in seconds.
# TYPE rpc_duration_seconds summary
rpc_duration_seconds{quantile="0.01"} 3102
rpc_duration_seconds{quantile="0.05"} 3272
rpc_duration_seconds{quantile="0.5"} 4773
rpc_duration_seconds{quantile="0.9"} 9001
rpc_duration_seconds{quantile="0.99"} 76656
rpc_duration_seconds_sum 1.7560473e+07
rpc_duration_seconds_count 2693

Current behavior:

rpc_duration_seconds 0.01=3102,0.05=3272,0.5=4773,0.9=9001,0.99=76656,count=2693,sum=17560473
http_request_duration_seconds sum=53423,0.05=24054,0.1=33444,0.2=100392,0.5=129389,1=133988,+Inf=144320,count=144320
http_requests_total,code=400,method=post counter=3
http_requests_total,code=200,method=post counter=1027

Desired behavior:

prometheus rpc_duration_seconds_count=2693,rpc_duration_seconds_sum=17560473
prometheus,quantile=0.01 rpc_duration_seconds=3102
prometheus,quantile=0.05 rpc_duration_seconds=3272
prometheus,quantile=0.5 rpc_duration_seconds=4773
prometheus,quantile=0.9 rpc_duration_seconds=9001
prometheus,quantile=0.99 rpc_duration_seconds=76656

prometheus http_request_duration_seconds_sum=53423,http_request_duration_seconds_count=144320
prometheus,le=0.05 http_request_duration_seconds_bucket=24054
prometheus,le=0.1 http_request_duration_seconds_bucket=33444
prometheus,le=0.2 http_request_duration_seconds_bucket=100392
prometheus,le=0.5 http_request_duration_seconds_bucket=129389
prometheus,le=1.0 http_request_duration_seconds_bucket=133988
prometheus,le=+Inf http_request_duration_seconds_bucket=144320

prometheus,code=400,method=post http_requests_total=3
prometheus,code=200,method=post http_requests_total=1027

Use case:

This will make histograms from prometheus use the same format as the histogram aggregator and also help with roundtripping back to the prometheus output.

See #4354

The text was updated successfully, but these errors were encountered:

danielnelson · 2019-02-21T23:41:50Z

Update proposal with a method for splitting/joining metric names.

@trevorwhitney I believe if we use the transformations above in the prometheus input and output, it will allow us to add the histogram TYPE in the histogram aggregator and have the output be usable by Prometheus.

danielnelson · 2019-02-23T01:03:27Z

What about points with a simple metric name (without underscores)?

something{foo="bar} 42

pianohacker · 2019-03-13T22:36:25Z

What about points with a simple metric name (without underscores)?
something{foo="bar} 42

This seems like it could be solved by the current approach in the prometheus output plugin: https://github.com/influxdata/telegraf/blob/master/plugins/outputs/prometheus_client/prometheus_client.go#L552-L556

We'd lose roundtripping of metrics ending in value, but I think that's a reasonable price to pay.

vishiy · 2019-04-05T23:58:56Z

Can't we just keep the prometheus timeseries as is and have prometheus tags/buckets as tags? Renaming them as they move thru telegraf would change the originality of the metric and would require users to understand this transformation to map to their original prom metric.

# TYPE cpu_usage_user gauge
cpu_usage_user{cpu="cpu0"} 1.4112903225816156

Expected behavior:
prometheus,cpu=cpu0,url=blah cpu_usage_user=1.4112903225816156 1505776751000000000

danielnelson · 2019-04-06T03:27:13Z

I think @vishiy's method is the better approach:

It elegantly handles the simple metric case.
Doesn't have the potential to creating conflicting metrics with common Telegraf plugins
No handling of the metric name means we can't guess wrong on how to split it.
Same names will reduce friction when working with both Prometheus and InfluxDB.

For users who want metric name splitting, it would be possible to handle this in a separate processor plugin.

I have updated the original post to reflect this style.

vishiy · 2019-04-25T02:27:24Z

@danielnelson - Sent a PR out for this (prometheus input changes)
#5767

Menthalion · 2019-04-29T11:06:43Z

@vishiy @danielnelson The problem with bucket upper bounds as tags is that functions as histogramQuantile that would be needed to calculate quantiles over bucket histograms take a (numeric) field as upper bound expression, not a set of tags. How would you work around that / is there a way to convert tag ('le') values to typed (numeric + Inf) values in a (synthetic) field ?

Without functionality like that it's still impossible to use efficient Prometheus histogram queries for dashboards.

danielnelson · 2019-04-29T19:56:39Z

Thanks for pointing this out @Menthalion. This isn't actually a new method for encoding histograms, it is already in use in some places of Telegraf such as the histogram aggregator. @nathanielc Is there something we can do to make these compatible with Flux functions?

juliusv · 2019-05-15T08:35:03Z

Hi everyone, I'm currently building the PromQL->Flux transpiler for InfluxDB and already touched on this topic in influxdata/influxdb#13893. For any PromQL-ish use case, we don't want to do any transformations that are special to the metric type (which is a very loose concept on the Prometheus side to begin with) or metric name. The mapping rules (in Flux / InfluxDBv2 terminology) just have to be dumb like this:

metric name -> _measurement column value
label name -> column name (but escape _value -> ~_value etc.)
label value -> column value (always string type)
sample value -> _value column value

The reason is that metric types are also not tracked or special-cased in Prometheus's own storage, so everything is just stored as flat labels and sample values. E.g. the only place where an le label is interpreted specially is when you feed a set of series to the histogram_quantile() PromQL function. Sending data out of Prometheus (most commonly via its remote write feature) also does not preserve metric type metadata, and the le label does not necessarily have to belong to a histogram or contain numeric values in all cases. So making it float-typed and treated separately would be wrong in some cases, and in the other cases it'd just introduce a lot of extra complexity for the PromQL use case. Same for the quantile label. Arbitrarily splitting metric names on underscores and doing guesswork on them is also always going to be unreliable, as the names are based on convention on the Prometheus side, but could really be anything.

In my transpiler work I also introduced a single field f64 to signal that the row stores the Prometheus sample value (and as there needs to some field), but it doesn't really matter what that field is named. Now there's the suggestion here to store the Prometheus metric name as a field name instead of in the _measurement column (and then that f64 field wouldn't be needed). I think this would be fine for the PromQL use case as well if people prefer it. It should only require minor changes to the transpiler.

danielnelson · 2019-05-15T20:11:17Z

Hi Julius. Don't worry, this proposal has no special handling based on prom type, and any ideas of splitting the name have also been dropped (one can always do this in a processor instead).

I feel like our handling of le here is also the consistent with Prometheus and will provide good interop, as well as working well on with InfluxDB. I strongly think we should continue to use tags for histogram buckets in general, as it doesn't require a field glob and parsing bucket bounds out of field name like the alternative I have seen does. I would like to see the Flux histogramQuantile be adapted to work with this style.

The _measurement and _value concepts are query side only, on write we need to use line protocol which has more structure than just columns. I am strongly in favor of using metric_name -> fieldkey because it provides:

Better support for InfluxQL
Better support for Kapacitor
Better support for Telegraf processors and aggregators.
More performant behavior within Telegraf and when sending to InfluxDB.
Most importantly, consistent naming with other Telegraf metrics with respect to what the measurement and field names represent.

juliusv · 2019-05-16T08:37:31Z

@danielnelson Thanks! If I understand you correctly you are agreeing with the simple mapping I proposed, plus the variant where you store the metric name in such a way that it would appear as the value of the _field column in Flux. That sounds good.

And the le and quantile labels will become normal string-typed columns? The current histogramQuantile() in Flux expects float-typed le values, but I made a copy of that function and changed it to expect string-typed le values and also behave differently in many other small ways so it behaves exactly like PromQL's histogram_quantile() function. See juliusv/flux@ac8e300. Whether to drop the existing universe.histogramQuantile() or keep it with different behavior is yet to be decided, but for my work I needed something to work exactly like PromQL.

danielnelson · 2019-05-16T21:01:42Z

If I understand you correctly you are agreeing with the simple mapping I proposed, plus the variant where you store the metric name in such a way that it would appear as the value of the _field column in Flux.

I don't actually know how the columns are named in flux, but the field name in the database would be the prometheus metric_name, for example: http_request_duration_seconds_bucket. Check the "desired behavior" in the top comment for examples of how it will look.

And the le and quantile labels will become normal string-typed columns?

Yes, I think they must be written as tags in order for this style of encoding to work and all tags are string typed.

juliusv · 2019-05-16T21:08:16Z

That sounds good then, thanks.

all tags are string typed.

Ah ok, wasn't 100% sure of this in the line protocol, since tags translate into columns in Flux, but columns can also have non-string types (float, int, string, bool). Good!

danielnelson · 2019-11-22T02:07:23Z

Closed in #5767

This updates the metric_version of the prometheus input to version 2, which brings the telegraf mapping more inline with how prometheus natively outputs metrics, especially histograms and summaries. See influxdata/telegraf#4415 for more information.

* feat(telefraf): change the prometheus->telegraf metrics mapping (#113) This updates the metric_version of the prometheus input to version 2, which brings the telegraf mapping more inline with how prometheus natively outputs metrics, especially histograms and summaries. See influxdata/telegraf#4415 for more information. * feat(dashboard): Update dashboards for Grafana 7.1 (#135) * feat(dashboard): Update dashboards for Grafana 7.1 * fix(docker): Set specific version of grafana for docker-compose * feat: Rename Telegraf to Sentinel Drone (#131) * feat: Add visor to Makefile; CI build (#149) * feat: add visor to makefile * Update Makefile Co-authored-by: Oli Evans <[email protected]> * chore: Remove sentinel- prefix in CLI usage * chore: Add rust deps to CI build env * chore: Add latest visor Co-authored-by: Oli Evans <[email protected]> Co-authored-by: Mike Greenberg <[email protected]> * dep: Update to [email protected] (#155) * feat(grafana): Update dashboards for latest Visor schema (#156) * chore: Fixup Makefile and docker-compose; Update submodules (#157) * chore: Remove redis docker-compose svcs; Create jaeger config * chore: Update docker-compose timescale svc to pg12 * chore: visor-indexer and -processor make directives use new subcommand * deps: Update to [email protected] and [email protected] * deps: Update to [email protected] * chore: Remove chainwatch services; Fix visor services * fix(docs): Correct documentation for visor and naming * Apply suggestions: Service desc edits; log output adj in service Co-authored-by: Adrian Lanzafame <[email protected]> * Update dashboards chain_power -> chain_powers Co-authored-by: Adrian Lanzafame <[email protected]> Co-authored-by: Adrian Lanzafame <[email protected]> Co-authored-by: Ian Davis <[email protected]> Co-authored-by: Oli Evans <[email protected]>

danielnelson added feature request Requests for new plugin and for new features to existing plugins area/prometheus labels Jul 12, 2018

danielnelson mentioned this issue Jul 20, 2018

Add option to map measurement names to field name into prometheus input #3407

Closed

3 tasks

danielnelson added the area/agent label Nov 12, 2018

russorat added this to the 1.11.0 milestone Feb 15, 2019

danielnelson mentioned this issue Feb 19, 2019

[Feature Request] Add histogram aggregator compatible with Prometheus #5435

Closed

danielnelson changed the title ~~Improve prometheus mapping for histogram/summary~~ Improve prometheus metric mapping to internal metrics / line protocol Feb 21, 2019

danielnelson mentioned this issue Mar 11, 2019

[outputs/stackdriver] Allow to group metrics to bypass MetricDescriptor quota (500) #5567

Closed

3 tasks

danielnelson mentioned this issue Apr 5, 2019

Prometheus Input plugin - field name = 'gauge' #5689

Closed

vishiy mentioned this issue Apr 25, 2019

prometheus metric_version = 2 & make scrape url as a configurable tag #5767

Merged

danielnelson modified the milestones: 1.11.0, 1.12.0 Jun 5, 2019

danielnelson mentioned this issue Jun 5, 2019

Add Prometheus remote write output plugin. #5942

Closed

3 tasks

danielnelson mentioned this issue Jun 14, 2019

Merge MySQL metrics inside 1 metric with label #5987

Closed

juliusv mentioned this issue Jul 16, 2019

PromQL transpiler: allow data model configurability influxdata/flux#1529

Closed

danielnelson modified the milestones: 1.12.0, 1.13.0 Aug 19, 2019

danielnelson mentioned this issue Sep 25, 2019

Need a more flexable histogram / tdigest #6440

Closed

danielnelson mentioned this issue Nov 22, 2019

Add prometheus serializer and use it in prometheus output #6703

Merged

3 tasks

danielnelson closed this as completed Nov 22, 2019

lanzafame mentioned this issue Sep 3, 2020

change the prometheus->telegraf metrics mapping filecoin-project/sentinel#113

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve prometheus metric mapping to internal metrics / line protocol #4415

Improve prometheus metric mapping to internal metrics / line protocol #4415

danielnelson commented Jul 12, 2018 •

edited

Loading

danielnelson commented Feb 21, 2019

danielnelson commented Feb 23, 2019

pianohacker commented Mar 13, 2019 •

edited

Loading

vishiy commented Apr 5, 2019

danielnelson commented Apr 6, 2019 •

edited

Loading

vishiy commented Apr 25, 2019

Menthalion commented Apr 29, 2019 •

edited

Loading

danielnelson commented Apr 29, 2019

juliusv commented May 15, 2019

danielnelson commented May 15, 2019

juliusv commented May 16, 2019

danielnelson commented May 16, 2019

juliusv commented May 16, 2019

danielnelson commented Nov 22, 2019

Improve prometheus metric mapping to internal metrics / line protocol #4415

Improve prometheus metric mapping to internal metrics / line protocol #4415

Comments

danielnelson commented Jul 12, 2018 • edited Loading

Proposal

Current behavior:

Desired behavior:

Use case:

danielnelson commented Feb 21, 2019

danielnelson commented Feb 23, 2019

pianohacker commented Mar 13, 2019 • edited Loading

vishiy commented Apr 5, 2019

danielnelson commented Apr 6, 2019 • edited Loading

vishiy commented Apr 25, 2019

Menthalion commented Apr 29, 2019 • edited Loading

danielnelson commented Apr 29, 2019

juliusv commented May 15, 2019

danielnelson commented May 15, 2019

juliusv commented May 16, 2019

danielnelson commented May 16, 2019

juliusv commented May 16, 2019

danielnelson commented Nov 22, 2019

danielnelson commented Jul 12, 2018 •

edited

Loading

pianohacker commented Mar 13, 2019 •

edited

Loading

danielnelson commented Apr 6, 2019 •

edited

Loading

Menthalion commented Apr 29, 2019 •

edited

Loading