Prometheus Exporter metrics with different tags should have only one HELP and TYPE comment line #5465

yingchen0706v · 2022-05-17T06:44:21Z

Bug Report

The exported Prometheus metrics with same name but different tags have duplicate HELP and TYPE comment lines.
According to https://prometheus.io/docs/instrumenting/exposition_formats/#text-format-details, it only allows one HELP/TYPE for any given metric.

To Reproduce

Rubular link if applicable: NA
Example log message if applicable: NA
Steps to reproduce the problem: setup a fluent-bit configuration with different output, as in configuration part in following.

Expected behavior
Only one line of TYPE and HELP should be generated for fluent-bit output metrics. But there duplicate ones as in screenshot.

Screenshots

As in highlight above, there are duplicate TYPE/HELP comment lines.

Your Environment

Version used: 1.9.3
Configuration:

[OUTPUT]
Name http
Alias confiant
Match bids
...
[OUTPUT]
Name s3
Alias s3
Match bids
region {{ .Values.s3RegionForBids }}
bucket {{ .Values.s3BucketForBids }}
...
[OUTPUT]
Name prometheus_exporter
Alias exporter
match internal_metrics
...
Environment name and version (e.g. Kubernetes? What version?): K8S
Server type and version: EKS
Operating System and version: x86_64 Linux 5.4
Filters and plugins: no filters, output plugin as in configuration.

Additional context
It cause issues when we try to feed those metrics to our monitoring system, as according to https://prometheus.io/docs/instrumenting/exposition_formats/#text-format-details, it only allows one HELP/TYPE for any given metric.

patrick-stephens · 2022-05-17T07:19:34Z

I cannot seem to reproduce this on 1.9.3 with this config as a test case:

[SERVICE]
  Http_server On

[INPUT]
  name dummy
  tag dummy1

[INPUT]
  name dummy
  tag dummy2
  
[OUTPUT]
  name stdout
  match dummy1

[OUTPUT]
  name stdout
  match dummy2

[OUTPUT]
  Name http
  match nothing

Run up the container and curl the output:

$ docker run --rm -d -p 2020:2020 -v $PWD/fluent-bit.conf:/fluent-bit/etc/fluent-bit.conf fluent/fluent-bit:1.9.3
$ curl -s http://127.0.0.1:2020/api/v1/metrics/prometheus
# HELP fluentbit_input_bytes_total Number of input bytes.
# TYPE fluentbit_input_bytes_total counter
fluentbit_input_bytes_total{name="dummy.0"} 468 1652771931407
fluentbit_input_bytes_total{name="dummy.1"} 468 1652771931407
# HELP fluentbit_input_records_total Number of input records.
# TYPE fluentbit_input_records_total counter
fluentbit_input_records_total{name="dummy.0"} 18 1652771931407
fluentbit_input_records_total{name="dummy.1"} 18 1652771931407
# HELP fluentbit_output_dropped_records_total Number of dropped records.
# TYPE fluentbit_output_dropped_records_total counter
fluentbit_output_dropped_records_total{name="http.2"} 0 1652771931407
fluentbit_output_dropped_records_total{name="stdout.0"} 0 1652771931407
fluentbit_output_dropped_records_total{name="stdout.1"} 0 1652771931407
# HELP fluentbit_output_errors_total Number of output errors.
# TYPE fluentbit_output_errors_total counter
fluentbit_output_errors_total{name="http.2"} 0 1652771931407
fluentbit_output_errors_total{name="stdout.0"} 0 1652771931407
fluentbit_output_errors_total{name="stdout.1"} 0 1652771931407
# HELP fluentbit_output_proc_bytes_total Number of processed output bytes.
# TYPE fluentbit_output_proc_bytes_total counter
fluentbit_output_proc_bytes_total{name="http.2"} 0 1652771931407
fluentbit_output_proc_bytes_total{name="stdout.0"} 416 1652771931407
fluentbit_output_proc_bytes_total{name="stdout.1"} 416 1652771931407
# HELP fluentbit_output_proc_records_total Number of processed output records.
# TYPE fluentbit_output_proc_records_total counter
fluentbit_output_proc_records_total{name="http.2"} 0 1652771931407
fluentbit_output_proc_records_total{name="stdout.0"} 16 1652771931407
fluentbit_output_proc_records_total{name="stdout.1"} 16 1652771931407
# HELP fluentbit_output_retried_records_total Number of retried records.
# TYPE fluentbit_output_retried_records_total counter
fluentbit_output_retried_records_total{name="http.2"} 0 1652771931407
fluentbit_output_retried_records_total{name="stdout.0"} 0 1652771931407
fluentbit_output_retried_records_total{name="stdout.1"} 0 1652771931407
# HELP fluentbit_output_retries_failed_total Number of abandoned batches because the maximum number of re-tries was reached.
# TYPE fluentbit_output_retries_failed_total counter
fluentbit_output_retries_failed_total{name="http.2"} 0 1652771931407
fluentbit_output_retries_failed_total{name="stdout.0"} 0 1652771931407
fluentbit_output_retries_failed_total{name="stdout.1"} 0 1652771931407
# HELP fluentbit_output_retries_total Number of output retries.
# TYPE fluentbit_output_retries_total counter
fluentbit_output_retries_total{name="http.2"} 0 1652771931407
fluentbit_output_retries_total{name="stdout.0"} 0 1652771931407
fluentbit_output_retries_total{name="stdout.1"} 0 1652771931407
# HELP fluentbit_uptime Number of seconds that Fluent Bit has been running.
# TYPE fluentbit_uptime counter
fluentbit_uptime 18
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1652771913
# HELP fluentbit_build_info Build version information.
# TYPE fluentbit_build_info gauge
fluentbit_build_info{version="1.9.3",edition="Community"} 1

patrick-stephens · 2022-05-17T07:34:49Z

Ah, this seems to be an issue with the Prometheus Exporter itself: it we use that with the recent Fluent Bit metrics input plugin then it generates invalid output:

[SERVICE]
  Http_server On

[INPUT]
  name dummy
  tag dummy1

[INPUT]
  name dummy
  tag dummy2
  
[OUTPUT]
  name stdout
  match dummy1

[OUTPUT]
  name stdout
  match dummy2

[OUTPUT]
  Name http
  match nothing

[INPUT]
  name            fluentbit_metrics
  tag             internal_metrics

[OUTPUT]
  name            prometheus_exporter
  match           internal_metrics
  port            2021

Run and check then to see the incorrect output - make sure to expose the 2021 port now:

$ docker run --rm -d -p 2020:2020 -p 2021:2021 -v $PWD/fluent-bit.conf:/fluent-bit/etc/fluent-bit.conf fluent/fluent-bit:1.9.3
$ curl -s http://127.0.0.1:2021/metrics
# HELP fluentbit_uptime Number of seconds that Fluent Bit has been running.
# TYPE fluentbit_uptime counter
fluentbit_uptime{hostname="653a853d661c"} 121
# HELP fluentbit_input_bytes_total Number of input bytes.
# TYPE fluentbit_input_bytes_total counter
fluentbit_input_bytes_total{name="dummy.0"} 3146
# HELP fluentbit_input_records_total Number of input records.
# TYPE fluentbit_input_records_total counter
fluentbit_input_records_total{name="dummy.0"} 121
# HELP fluentbit_input_bytes_total Number of input bytes.
# TYPE fluentbit_input_bytes_total counter
fluentbit_input_bytes_total{name="dummy.1"} 3146
# HELP fluentbit_input_records_total Number of input records.
# TYPE fluentbit_input_records_total counter
fluentbit_input_records_total{name="dummy.1"} 121
# HELP fluentbit_input_bytes_total Number of input bytes.
# TYPE fluentbit_input_bytes_total counter
fluentbit_input_bytes_total{name="fluentbit_metrics.2"} 520260
# HELP fluentbit_input_records_total Number of input records.
# TYPE fluentbit_input_records_total counter
fluentbit_input_records_total{name="fluentbit_metrics.2"} 60
# HELP fluentbit_input_metrics_scrapes_total Number of total metrics scrapes
# TYPE fluentbit_input_metrics_scrapes_total counter
fluentbit_input_metrics_scrapes_total{name="fluentbit_metrics.2"} 61
# HELP fluentbit_output_proc_records_total Number of processed output records.
# TYPE fluentbit_output_proc_records_total counter
fluentbit_output_proc_records_total{name="stdout.0"} 120
# HELP fluentbit_output_proc_bytes_total Number of processed output bytes.
# TYPE fluentbit_output_proc_bytes_total counter
fluentbit_output_proc_bytes_total{name="stdout.0"} 3120
# HELP fluentbit_output_errors_total Number of output errors.
# TYPE fluentbit_output_errors_total counter
fluentbit_output_errors_total{name="stdout.0"} 0
# HELP fluentbit_output_retries_total Number of output retries.
# TYPE fluentbit_output_retries_total counter
fluentbit_output_retries_total{name="stdout.0"} 0
# HELP fluentbit_output_retries_failed_total Number of abandoned batches because the maximum number of re-tries was reached.
# TYPE fluentbit_output_retries_failed_total counter
fluentbit_output_retries_failed_total{name="stdout.0"} 0
# HELP fluentbit_output_dropped_records_total Number of dropped records.
# TYPE fluentbit_output_dropped_records_total counter
fluentbit_output_dropped_records_total{name="stdout.0"} 0
# HELP fluentbit_output_retried_records_total Number of retried records.
# TYPE fluentbit_output_retried_records_total counter
fluentbit_output_retried_records_total{name="stdout.0"} 0
# HELP fluentbit_output_proc_records_total Number of processed output records.
# TYPE fluentbit_output_proc_records_total counter
fluentbit_output_proc_records_total{name="stdout.1"} 120
# HELP fluentbit_output_proc_bytes_total Number of processed output bytes.
# TYPE fluentbit_output_proc_bytes_total counter
fluentbit_output_proc_bytes_total{name="stdout.1"} 3120
# HELP fluentbit_output_errors_total Number of output errors.
# TYPE fluentbit_output_errors_total counter
fluentbit_output_errors_total{name="stdout.1"} 0
# HELP fluentbit_output_retries_total Number of output retries.
# TYPE fluentbit_output_retries_total counter
fluentbit_output_retries_total{name="stdout.1"} 0
# HELP fluentbit_output_retries_failed_total Number of abandoned batches because the maximum number of re-tries was reached.
# TYPE fluentbit_output_retries_failed_total counter
fluentbit_output_retries_failed_total{name="stdout.1"} 0
# HELP fluentbit_output_dropped_records_total Number of dropped records.
# TYPE fluentbit_output_dropped_records_total counter
fluentbit_output_dropped_records_total{name="stdout.1"} 0
# HELP fluentbit_output_retried_records_total Number of retried records.
# TYPE fluentbit_output_retried_records_total counter
fluentbit_output_retried_records_total{name="stdout.1"} 0
# HELP fluentbit_output_proc_records_total Number of processed output records.
# TYPE fluentbit_output_proc_records_total counter
fluentbit_output_proc_records_total{name="http.2"} 0
# HELP fluentbit_output_proc_bytes_total Number of processed output bytes.
# TYPE fluentbit_output_proc_bytes_total counter
fluentbit_output_proc_bytes_total{name="http.2"} 0
# HELP fluentbit_output_errors_total Number of output errors.
# TYPE fluentbit_output_errors_total counter
fluentbit_output_errors_total{name="http.2"} 0
# HELP fluentbit_output_retries_total Number of output retries.
# TYPE fluentbit_output_retries_total counter
fluentbit_output_retries_total{name="http.2"} 0
# HELP fluentbit_output_retries_failed_total Number of abandoned batches because the maximum number of re-tries was reached.
# TYPE fluentbit_output_retries_failed_total counter
fluentbit_output_retries_failed_total{name="http.2"} 0
# HELP fluentbit_output_dropped_records_total Number of dropped records.
# TYPE fluentbit_output_dropped_records_total counter
fluentbit_output_dropped_records_total{name="http.2"} 0
# HELP fluentbit_output_retried_records_total Number of retried records.
# TYPE fluentbit_output_retried_records_total counter
fluentbit_output_retried_records_total{name="http.2"} 0
# HELP fluentbit_output_proc_records_total Number of processed output records.
# TYPE fluentbit_output_proc_records_total counter
fluentbit_output_proc_records_total{name="prometheus_exporter.3"} 60
# HELP fluentbit_output_proc_bytes_total Number of processed output bytes.
# TYPE fluentbit_output_proc_bytes_total counter
fluentbit_output_proc_bytes_total{name="prometheus_exporter.3"} 520260
# HELP fluentbit_output_errors_total Number of output errors.
# TYPE fluentbit_output_errors_total counter
fluentbit_output_errors_total{name="prometheus_exporter.3"} 0
# HELP fluentbit_output_retries_total Number of output retries.
# TYPE fluentbit_output_retries_total counter
fluentbit_output_retries_total{name="prometheus_exporter.3"} 0
# HELP fluentbit_output_retries_failed_total Number of abandoned batches because the maximum number of re-tries was reached.
# TYPE fluentbit_output_retries_failed_total counter
fluentbit_output_retries_failed_total{name="prometheus_exporter.3"} 0
# HELP fluentbit_output_dropped_records_total Number of dropped records.
# TYPE fluentbit_output_dropped_records_total counter
fluentbit_output_dropped_records_total{name="prometheus_exporter.3"} 0
# HELP fluentbit_output_retried_records_total Number of retried records.
# TYPE fluentbit_output_retried_records_total counter
fluentbit_output_retried_records_total{name="prometheus_exporter.3"} 0
# HELP fluentbit_process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE fluentbit_process_start_time_seconds gauge
fluentbit_process_start_time_seconds{hostname="653a853d661c"} 1652772686
# HELP fluentbit_build_info Build version information.
# TYPE fluentbit_build_info gauge
fluentbit_build_info{hostname="653a853d661c",version="1.9.3",os="linux"} 1652772686

The webserver output is fine:

$ curl -s http://127.0.0.1:2020/api/v1/metrics/prometheus
# HELP fluentbit_input_bytes_total Number of input bytes.
# TYPE fluentbit_input_bytes_total counter
fluentbit_input_bytes_total{name="dummy.0"} 3926 1652772837034
fluentbit_input_bytes_total{name="dummy.1"} 3926 1652772837034
fluentbit_input_bytes_total{name="fluentbit_metrics.2"} 650325 1652772837034
# HELP fluentbit_input_records_total Number of input records.
# TYPE fluentbit_input_records_total counter
fluentbit_input_records_total{name="dummy.0"} 151 1652772837034
fluentbit_input_records_total{name="dummy.1"} 151 1652772837034
fluentbit_input_records_total{name="fluentbit_metrics.2"} 75 1652772837034
# HELP fluentbit_output_dropped_records_total Number of dropped records.
# TYPE fluentbit_output_dropped_records_total counter
fluentbit_output_dropped_records_total{name="http.2"} 0 1652772837034
fluentbit_output_dropped_records_total{name="prometheus_exporter.3"} 0 1652772837034
fluentbit_output_dropped_records_total{name="stdout.0"} 0 1652772837034
fluentbit_output_dropped_records_total{name="stdout.1"} 0 1652772837034
# HELP fluentbit_output_errors_total Number of output errors.
# TYPE fluentbit_output_errors_total counter
fluentbit_output_errors_total{name="http.2"} 0 1652772837034
fluentbit_output_errors_total{name="prometheus_exporter.3"} 0 1652772837034
fluentbit_output_errors_total{name="stdout.0"} 0 1652772837034
fluentbit_output_errors_total{name="stdout.1"} 0 1652772837034
# HELP fluentbit_output_proc_bytes_total Number of processed output bytes.
# TYPE fluentbit_output_proc_bytes_total counter
fluentbit_output_proc_bytes_total{name="http.2"} 0 1652772837034
fluentbit_output_proc_bytes_total{name="prometheus_exporter.3"} 641654 1652772837034
fluentbit_output_proc_bytes_total{name="stdout.0"} 3874 1652772837034
fluentbit_output_proc_bytes_total{name="stdout.1"} 3874 1652772837034
# HELP fluentbit_output_proc_records_total Number of processed output records.
# TYPE fluentbit_output_proc_records_total counter
fluentbit_output_proc_records_total{name="http.2"} 0 1652772837034
fluentbit_output_proc_records_total{name="prometheus_exporter.3"} 74 1652772837034
fluentbit_output_proc_records_total{name="stdout.0"} 149 1652772837034
fluentbit_output_proc_records_total{name="stdout.1"} 149 1652772837034
# HELP fluentbit_output_retried_records_total Number of retried records.
# TYPE fluentbit_output_retried_records_total counter
fluentbit_output_retried_records_total{name="http.2"} 0 1652772837034
fluentbit_output_retried_records_total{name="prometheus_exporter.3"} 0 1652772837034
fluentbit_output_retried_records_total{name="stdout.0"} 0 1652772837034
fluentbit_output_retried_records_total{name="stdout.1"} 0 1652772837034
# HELP fluentbit_output_retries_failed_total Number of abandoned batches because the maximum number of re-tries was reached.
# TYPE fluentbit_output_retries_failed_total counter
fluentbit_output_retries_failed_total{name="http.2"} 0 1652772837034
fluentbit_output_retries_failed_total{name="prometheus_exporter.3"} 0 1652772837034
fluentbit_output_retries_failed_total{name="stdout.0"} 0 1652772837034
fluentbit_output_retries_failed_total{name="stdout.1"} 0 1652772837034
# HELP fluentbit_output_retries_total Number of output retries.
# TYPE fluentbit_output_retries_total counter
fluentbit_output_retries_total{name="http.2"} 0 1652772837034
fluentbit_output_retries_total{name="prometheus_exporter.3"} 0 1652772837034
fluentbit_output_retries_total{name="stdout.0"} 0 1652772837034
fluentbit_output_retries_total{name="stdout.1"} 0 1652772837034
# HELP fluentbit_uptime Number of seconds that Fluent Bit has been running.
# TYPE fluentbit_uptime counter
fluentbit_uptime 151
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1652772686
# HELP fluentbit_build_info Build version information.
# TYPE fluentbit_build_info gauge
fluentbit_build_info{version="1.9.3",edition="Community"} 1

yingchen0706v · 2022-05-17T08:18:46Z

@patrick-stephens it works with default configuration, but the metrics are exported with endpoint /api/v1/metrics/prometheus instead of /metrcis. Is there a way to make it use /metrcis?

patrick-stephens · 2022-05-17T08:20:18Z

I don't think so as those routes are part of the web server. Scrape config should handle it fine though, you just need to configure the path so it doesn't use the default on the Prometheus side

yingchen0706v · 2022-05-17T08:45:47Z

thanks @patrick-stephens. I'll workaround it with other solution. Close the ticket for now. Thank you for help.

patrick-stephens · 2022-05-23T10:58:50Z

I've re-opened this as it is a legitimate bug that will prevent use of the exporter. @leonardo-albertovich can you take a look?

The issue seems to be the metrics are not grouped together for related things with the exporter output but they are for the web server.

github-actions · 2022-08-22T02:18:20Z

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

github-actions · 2022-11-22T02:15:22Z

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

a-thaler · 2023-08-15T09:25:19Z

It seems that Prometheus and VictoriaMetrics can handle that situation well, however there are providers like Dynatrace which scrape only the first entry of every metric and drops the rest.

As the new mechanism using the fluentbit_metrics as input seems to be the future-safe solution (storage metrics are available here in prometheus format using the prometheus-exporter and it is much more flexible), it will be great if the problem could get solved so that the new mechanism can be adopted more widely.

ccampo133 · 2023-08-25T17:56:28Z

Ran into this bug today and confirmed that it is due to the fluentbit_metrics input plugin. Strangely, as @a-thaler mentioned, Prometheus itself has no issue parsing this malformed metrics text (despite the format violating its own spec). The Prometheus Go parser however fails with an error, as expected (see: https://github.com/prometheus/common/blob/main/expfmt/text_parse.go#L500)

It would be great if the fluentbit_metrics plugin be used with the Prometheus exporter output plugin and formatted properly. This would at the very least allow me to add additional labels to the metrics, which the monitoring API does not. That being said, the monitoring API metrics endpoint (mentioned here #5465 (comment)) is a sufficient workaround for now, at least for my use case.

randvoorhies · 2024-07-25T23:42:24Z

I'm trying to pull in the fluentbit v2 metrics from fluentbit 3.1.3 into telegraf which uses the Go parser and am stuck because of this issue. It seems to only be with v2, which I need so that I can plot my storage buffer usages.

evgfitil · 2024-08-12T12:04:27Z

I’ve encountered this issue as well, specifically with the inability to use storage metrics available exclusively in the API/v2. Resolving this would significantly improve our monitoring capabilities. I hope this issue can be prioritized in future

bwplotka · 2024-08-14T17:16:06Z

👋🏽

Looks like more ppl want to use the new Prometheus endpoint, but can't, due to a broken exposition format implementation. Any updates on this, or at least pointers what are the challenges?

(Not that it helps here, but I'm Prometheus maintainer here, open for feedback on our side how to make it easier for C codebases)

braydonk · 2024-08-14T21:36:28Z

I looked into this today, here's what I found.

The problem

When all the metrics are collected from each plugin, the cmt_cat function is used to append an entire cmt context into the single one that will eventually get sent down the line. This is done because each plugin gets its own separate cmt context, because each plugin has the opportunity to register its own metrics. However, each input, filter, and output plugin also sets up a set of default metrics separately in their own contexts.
Let’s use fluentbit_input_records_total as an example. This metric is registered for every input plugin using the tag in the name label. The registration happens independently in each cmt context for every new input plugin. This counter’s map contains one metric, the counter for this name label for this input plugin. When this context is collected, each counter gets appended to the context.
Imagine there are 3 input plugins, and each one has its own metric context with a registered input_records_total. The problem is that Fluent Bit does not actually recognize that in the full cmt context that these metrics will be added to, there is already an input_records_total, and thus each will be registered as 3 different metrics. Once this gets to the process for encoding individual metrics, there will be a HELP and TYPE banner produced for each one separately, because they aren’t considered by cmetrics to be the same metric. In reality, what we would like is in the overall cmetrics payload with all metrics, there would be one metric representing input_records_total with 3 different metrics in its map for each of the 3 input plugins.

Solution

I began looking at this in an assistive capacity for another team; it isn't something that directly affects my work at this time. As such, it is unlikely I will be able to dedicate the time to develop and shepherd a fix myself. However, I've outlined what I think would be the two best possible ways to resolve this which someone else could take on.

Proposal 1: Shared metrics context for each plugin type

One path forward that I see is for all input plugins to share one metric context. This would be the same for filter and output plugins. In this case, the shared metrics context would be wrapped in a struct that also includes the addresses for each of the shared metrics, and when this shared context is passed into the initialization procedure of a new plugin instance, it simply records new values in the existing metrics.

I wrote a proof of concept for this just for input plugins: #9231
Much of the code is a mess, but if you pull it down and build it, then use the following config:

[SERVICE]
    HTTP_Server  On
    HTTP_Listen  0.0.0.0
    HTTP_PORT    2020

[INPUT]
    Name cpu

[INPUT]
    Name cpu

[INPUT]
    Name cpu

[OUTPUT]
    Name  stdout
    Match *

You will see the resulting metrics being correctly grouped as the Prometheus Exposition Format specifies.

braydonk@bk:~/Documents/test_flb$ curl localhost:2020/api/v2/metrics/prometheus                                                                                                                                                                                                             
# HELP fluentbit_uptime Number of seconds that Fluent Bit has been running.                                                                                                                                                                                                                 
# TYPE fluentbit_uptime counter                                                                                                                                                                                                                                                             
fluentbit_uptime{hostname="bk.c.googlers.com"} 3                                                                                                                                                                                                                                            
# HELP fluentbit_input_bytes_total Number of input bytes.                                                                                                                                                                                                                                   
# TYPE fluentbit_input_bytes_total counter                                                                                                                                                                                                                                                  
fluentbit_input_bytes_total{name="cpu.0"} 6628                                                                                                                                                                                                                                              
fluentbit_input_bytes_total{name="cpu.1"} 4971                                                                                                                                                                                                                                              
fluentbit_input_bytes_total{name="cpu.2"} 4971                                                                                                                                                                                                                                              
# HELP fluentbit_input_records_total Number of input records.                                                                                                                                                                                                                               
# TYPE fluentbit_input_records_total counter                                                                                                                                                                                                                                                
fluentbit_input_records_total{name="cpu.0"} 4                                                                                                                                                                                                                                               
fluentbit_input_records_total{name="cpu.1"} 3                                                                                                                                                                                                                                               
fluentbit_input_records_total{name="cpu.2"} 3

If this approach would make sense, I would hand it off to @shuaich, my coworker from the team trying to tackle this problem. I think it is a straightforward enough implementation that I could provide the guidance to do relatively simply.

The only open question is thread safety; if using threaded input plugins, I'm not sure if the cmt context is designed for thread safety. Seems to work fine for threaded output plugins so I'm guessing it's okay, but never tried with threaded input plugins.

Proposal 2: Adjust `cmt_cat` to account for metrics that already exist

I'm not sure this is a great path forward, but I'll include it here. The other way I see to accomplish this is for cmt_cat to account for metrics that already exist in the destination context. i.e. if I'm appending a context that contains fluentbit_input_records_total, cmt_cat would need to recognize that fluentbit_input_records_total already exists, and instead of copying the entire metric add it as a value to the existing metric's cmt_map.

This would be a much harder implementation. I think this should only be considered if the thread safety of cmt isn't solid enough for Proposal 1.

If this were the direction chosen, I'd recommend a Fluent Bit maintainer take it on as it is not straightforward and has nuances deep in the library code that aren't straightforward to come up with as a standard community contributor.

braydonk · 2024-08-14T21:41:06Z

CC @edsiper @leonardo-albertovich to look over my proposals

bbkfhq · 2024-08-27T17:57:05Z

I'm also affected by this issue. The presence of duplicate "TYPE" lines breaks Telegraf's parsing.

decoding response failed: text format parsing error in line 10: second HELP line for metric name "fluentbit_input_bytes_total"

edsiper · 2024-09-05T18:47:15Z

I've pushed a draft PR to CMetrics to fix this: fluent/cmetrics#222

For testing purposes, I created a test branch of Fluent Bit here:

folks, would you mind give it a try to the test branch ? any help is appreciated

lecaros · 2024-09-06T21:35:03Z

Hi @edsiper
I was able to reproduce the issue with Telegraf and Fluent Bit 3.1.7.

podman run --rm -v $(PWD)/telegraf.config:/etc/telegraf/telegraf.conf:ro --entrypoint=telegraf telegraf
2024-09-06T21:21:48Z I! Loading config: /etc/telegraf/telegraf.conf
2024-09-06T21:21:48Z I! Starting Telegraf 1.31.3 brought to you by InfluxData the makers of InfluxDB
2024-09-06T21:21:48Z I! Available plugins: 234 inputs, 9 aggregators, 32 processors, 26 parsers, 60 outputs, 6 secret-stores
2024-09-06T21:21:48Z I! Loaded inputs: prometheus
2024-09-06T21:21:48Z I! Loaded aggregators: 
2024-09-06T21:21:48Z I! Loaded processors: 
2024-09-06T21:21:48Z I! Loaded secretstores: 
2024-09-06T21:21:48Z I! Loaded outputs: exec
2024-09-06T21:21:48Z I! Tags enabled: host=8e2205bd85d8
2024-09-06T21:21:48Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"8e2205bd85d8", Flush Interval:10s
2024-09-06T21:21:48Z I! [inputs.prometheus] Using the label selector:  and field selector: 
2024-09-06T21:21:50Z E! [inputs.prometheus] Error in plugin: error reading metrics for "http://192.168.100.61:2020/api/v2/metrics/prometheus": decoding response failed: text format parsing error in line 10: second HELP line for metric name "fluentbit_input_bytes_total"
2024-09-06T21:22:00Z E! [inputs.prometheus] Error in plugin: error reading metrics for "http://192.168.100.61:2020/api/v2/metrics/prometheus": decoding response failed: text format parsing error in line 10: second HELP line for metric name "fluentbit_input_bytes_total"
^C2024-09-06T21:22:03Z I! [agent] Hang on, flushing any cached metrics before shutdown
2024-09-06T21:22:03Z I! [agent] Stopping running outputs

I've also used the branch from #9360 to validate the fix.

 podman run --rm -v $(PWD)/telegraf.config:/etc/telegraf/telegraf.conf:ro --entrypoint=telegraf telegraf
2024-09-06T21:28:04Z I! Loading config: /etc/telegraf/telegraf.conf
2024-09-06T21:28:04Z I! Starting Telegraf 1.31.3 brought to you by InfluxData the makers of InfluxDB
2024-09-06T21:28:04Z I! Available plugins: 234 inputs, 9 aggregators, 32 processors, 26 parsers, 60 outputs, 6 secret-stores
2024-09-06T21:28:04Z I! Loaded inputs: prometheus
2024-09-06T21:28:04Z I! Loaded aggregators: 
2024-09-06T21:28:04Z I! Loaded processors: 
2024-09-06T21:28:04Z I! Loaded secretstores: 
2024-09-06T21:28:04Z I! Loaded outputs: file
2024-09-06T21:28:04Z I! Tags enabled: host=519cbe732e08
2024-09-06T21:28:04Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"519cbe732e08", Flush Interval:10s
2024-09-06T21:28:04Z I! [inputs.prometheus] Using the label selector:  and field selector: 
fluentbit_uptime,host=519cbe732e08,hostname=chronolap.local,url=http://192.168.100.61:2020/api/v2/metrics/prometheus counter=152 1725658090000000000
fluentbit_output_proc_records_total,host=519cbe732e08,name=stdout.0,url=http://192.168.100.61:2020/api/v2/metrics/prometheus counter=1207 1725658090000000000
fluentbit_storage_fs_chunks_down,host=519cbe732e08,url=http://192.168.100.61:2020/api/v2/metrics/prometheus gauge=0 1725658090000000000
fluentbit_input_bytes_total,host=519cbe732e08,name=dummy.0,url=http://192.168.100.61:2020/api/v2/metrics/prometheus counter=27324 1725658090000000000
fluentbit_input_bytes_total,host=519cbe732e08,name=dummy.1,url=http://192.168.100.61:2020/api/v2/metrics/prometheus counter=16416 1725658090000000000

Given that @shuaich already tested the Prometheus Golang scraper, I'd say the fix works.

edsiper · 2024-09-16T16:38:59Z

fixed with #9392 (master) and #9393 (3.1)

yingchen0706v added the status: waiting-for-triage label May 17, 2022

patrick-stephens added waiting-for-user Waiting for more information, tests or requested changes and removed status: waiting-for-triage labels May 17, 2022

patrick-stephens changed the title ~~Prometheus metrics with different tags should have only one HELP and TYPE comment line~~ Prometheus Exporter metrics with different tags should have only one HELP and TYPE comment line May 17, 2022

patrick-stephens added bug and removed waiting-for-user Waiting for more information, tests or requested changes labels May 17, 2022

yingchen0706v closed this as completed May 17, 2022

patrick-stephens reopened this May 23, 2022

patrick-stephens mentioned this issue Jun 17, 2022

metrics: add prometheus format support for storage metrics #5334

Closed

3 tasks

github-actions bot added Stale and removed Stale labels Aug 22, 2022

github-actions bot added the Stale label Nov 22, 2022

patrick-stephens added enhancement and removed Stale labels Nov 22, 2022

a-thaler mentioned this issue Aug 15, 2023

New fluentbit metric exposure is not following prometheus specification leading to failed scrapes for other vendors kyma-project/kyma#17976

Closed

a-thaler mentioned this issue Jul 15, 2024

V1 metrics API for prometheus changed metric names in 3.1.0 #9086

Closed

patrick-stephens mentioned this issue Aug 13, 2024

/api/v2/metrics/prometheus endpoint doesn't follow prometheus 2.0 exposition format #9191

Closed

edsiper mentioned this issue Sep 5, 2024

cat: do not create new context on new metric being concatenated fluent/cmetrics#222

Merged

edsiper added this to the Fluent Bit v3.1.8 milestone Sep 16, 2024

edsiper closed this as completed Sep 16, 2024

edsiper modified the milestones: Fluent Bit v3.1.8, Fluent Bit v3.2.0 Sep 16, 2024

edsiper mentioned this issue Sep 16, 2024

lib: cmetrics: test cat fixes #9360

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prometheus Exporter metrics with different tags should have only one HELP and TYPE comment line #5465

Prometheus Exporter metrics with different tags should have only one HELP and TYPE comment line #5465

yingchen0706v commented May 17, 2022 •

edited

Loading

patrick-stephens commented May 17, 2022

patrick-stephens commented May 17, 2022

yingchen0706v commented May 17, 2022 •

edited

Loading

patrick-stephens commented May 17, 2022

yingchen0706v commented May 17, 2022

patrick-stephens commented May 23, 2022

github-actions bot commented Aug 22, 2022

github-actions bot commented Nov 22, 2022

a-thaler commented Aug 15, 2023

ccampo133 commented Aug 25, 2023 •

edited

Loading

randvoorhies commented Jul 25, 2024

evgfitil commented Aug 12, 2024

bwplotka commented Aug 14, 2024

braydonk commented Aug 14, 2024 •

edited

Loading

braydonk commented Aug 14, 2024

bbkfhq commented Aug 27, 2024 •

edited

Loading

edsiper commented Sep 5, 2024

lecaros commented Sep 6, 2024

edsiper commented Sep 16, 2024

Prometheus Exporter metrics with different tags should have only one HELP and TYPE comment line #5465

Prometheus Exporter metrics with different tags should have only one HELP and TYPE comment line #5465

Comments

yingchen0706v commented May 17, 2022 • edited Loading

Bug Report

patrick-stephens commented May 17, 2022

patrick-stephens commented May 17, 2022

yingchen0706v commented May 17, 2022 • edited Loading

patrick-stephens commented May 17, 2022

yingchen0706v commented May 17, 2022

patrick-stephens commented May 23, 2022

github-actions bot commented Aug 22, 2022

github-actions bot commented Nov 22, 2022

a-thaler commented Aug 15, 2023

ccampo133 commented Aug 25, 2023 • edited Loading

randvoorhies commented Jul 25, 2024

evgfitil commented Aug 12, 2024

bwplotka commented Aug 14, 2024

braydonk commented Aug 14, 2024 • edited Loading

The problem

Solution

Proposal 1: Shared metrics context for each plugin type

Proposal 2: Adjust cmt_cat to account for metrics that already exist

braydonk commented Aug 14, 2024

bbkfhq commented Aug 27, 2024 • edited Loading

edsiper commented Sep 5, 2024

lecaros commented Sep 6, 2024

edsiper commented Sep 16, 2024

yingchen0706v commented May 17, 2022 •

edited

Loading

yingchen0706v commented May 17, 2022 •

edited

Loading

ccampo133 commented Aug 25, 2023 •

edited

Loading

braydonk commented Aug 14, 2024 •

edited

Loading

Proposal 2: Adjust `cmt_cat` to account for metrics that already exist

bbkfhq commented Aug 27, 2024 •

edited

Loading