Few metrics shows err-mimir-sample-out-of-order and doesn't send to Grafana Mimir #7763

helmut72 · 2023-07-27T19:31:32Z

Bug Report

Describe the bug

When adding one single scraper, Fluent-bit works fine. Also 2 scrapers looks like it works. When adding 3 or more scrapers, Fluent-bit doesn't send metrics to Mimir anymore.

To Reproduce

Example log message if applicable:

[2023/07/27 21:21:17] [debug] [upstream] KA connection #65 to 192.168.0.1:9009 is now available
[2023/07/27 21:21:17] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] http_post result FLB_ERROR
[2023/07/27 21:21:17] [debug] [out flush] cb_destroy coro_id=232
[2023/07/27 21:21:17] [debug] [task] destroy task=0x7ff028e81040 (task_id=4)
[2023/07/27 21:21:26] [debug] [http_client] not using http_proxy for header
[2023/07/27 21:21:26] [debug] [http_client] not using http_proxy for header
[2023/07/27 21:21:26] [debug] [http_client] not using http_proxy for header
[2023/07/27 21:21:26] [debug] [http_client] not using http_proxy for header
[2023/07/27 21:21:26] [debug] [http_client] not using http_proxy for header
[2023/07/27 21:21:26] [debug] [input chunk] update output instances with new chunk size diff=24769, records=0, input=prometheus_scrape.3
[2023/07/27 21:21:26] [debug] [input chunk] update output instances with new chunk size diff=19869, records=0, input=prometheus_scrape.0
[2023/07/27 21:21:26] [debug] [input chunk] update output instances with new chunk size diff=9679, records=0, input=prometheus_scrape.2
[2023/07/27 21:21:26] [debug] [input coro] destroy coro_id=93
[2023/07/27 21:21:26] [debug] [input coro] destroy coro_id=93
[2023/07/27 21:21:26] [debug] [input coro] destroy coro_id=93
[2023/07/27 21:21:26] [debug] [input chunk] update output instances with new chunk size diff=69031, records=0, input=prometheus_scrape.4
[2023/07/27 21:21:26] [debug] [input coro] destroy coro_id=93
[2023/07/27 21:21:26] [debug] [input chunk] update output instances with new chunk size diff=78646, records=0, input=prometheus_scrape.1
[2023/07/27 21:21:26] [debug] [input coro] destroy coro_id=93
[2023/07/27 21:21:27] [debug] [task] created task=0x7ff028e81040 id=0 OK
[2023/07/27 21:21:27] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] task_id=0 assigned to thread #1
[2023/07/27 21:21:27] [debug] [task] created task=0x7ff028e80e60 id=1 OK
[2023/07/27 21:21:27] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] task_id=1 assigned to thread #0
[2023/07/27 21:21:27] [debug] [task] created task=0x7ff028e80f00 id=2 OK
[2023/07/27 21:21:27] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] task_id=2 assigned to thread #1
[2023/07/27 21:21:27] [debug] [task] created task=0x7ff028e80fa0 id=3 OK
[2023/07/27 21:21:27] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] task_id=3 assigned to thread #0
[2023/07/27 21:21:27] [debug] [task] created task=0x7ff028e81180 id=4 OK
[2023/07/27 21:21:27] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] task_id=4 assigned to thread #1
[2023/07/27 21:21:27] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] cmetrics msgpack size: 19869
[2023/07/27 21:21:27] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] cmetrics msgpack size: 78646
[2023/07/27 21:21:27] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] cmetric_id=0 decoded 0-19869 payload_size=49627
[2023/07/27 21:21:27] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] final payload size: 49627
[2023/07/27 21:21:27] [debug] [upstream] KA connection #69 to 192.168.0.1:9009 has been assigned (recycled)
[2023/07/27 21:21:27] [debug] [http_client] not using http_proxy for header
[2023/07/27 21:21:27] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] cmetrics msgpack size: 9679
[2023/07/27 21:21:27] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] cmetric_id=0 decoded 0-9679 payload_size=5043
[2023/07/27 21:21:27] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] final payload size: 5043
[2023/07/27 21:21:27] [debug] [upstream] KA connection #67 to 192.168.0.1:9009 has been assigned (recycled)
[2023/07/27 21:21:27] [debug] [http_client] not using http_proxy for header
[2023/07/27 21:21:27] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] cmetrics msgpack size: 69031
[2023/07/27 21:21:27] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] cmetric_id=0 decoded 0-78646 payload_size=56468
[2023/07/27 21:21:27] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] final payload size: 56468
[2023/07/27 21:21:27] [debug] [upstream] KA connection #64 to 192.168.0.1:9009 has been assigned (recycled)
[2023/07/27 21:21:27] [debug] [http_client] not using http_proxy for header
[2023/07/27 21:21:27] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] cmetrics msgpack size: 24769
[2023/07/27 21:21:27] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] cmetric_id=0 decoded 0-24769 payload_size=8001
[2023/07/27 21:21:27] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] final payload size: 8001
[2023/07/27 21:21:27] [debug] [upstream] KA connection #68 to 192.168.0.1:9009 has been assigned (recycled)
[2023/07/27 21:21:27] [debug] [http_client] not using http_proxy for header
[2023/07/27 21:21:27] [error] [output:prometheus_remote_write:prometheus_remote_write.0] 192.168.0.1:9009, HTTP status=400
failed pushing to ingester: user=anonymous: the sample has been rejected because another sample with a more recent timestamp has already been ingested and out-of-order samples are not allowed (err-mimir-sample-out-of-order). The affected sample has timestamp 2023-07-27T19:21:26.257Z and is from series {__name__="go_memstats_alloc_bytes_total"}

Fluent-bit conf:

[SERVICE]
    http_server  off
    log_level    debug
    #log_level    error

[INPUT]
    name            prometheus_scrape
    tag             prometheus.docker
    host            127.0.0.1
    port            9323

[INPUT]
    name            prometheus_scrape
    tag             prometheus.node
    host            127.0.0.1
    port            9100

[INPUT]
    name            prometheus_scrape
    tag             prometheus.fail2ban
    host            127.0.0.1
    port            9191

[INPUT]
    name            prometheus_scrape
    tag             prometheus.sftpgo
    host            127.0.0.1
    port            10000

[INPUT]
    name            prometheus_scrape
    tag             prometheus.caddy
    host            127.0.0.1
    port            2019

[OUTPUT]
    name        prometheus_remote_write
    host        192.168.0.1
    match       prometheus.*
    uri         /api/v1/push
    port        9009
    tls         off

Steps to reproduce the problem:

One or two of these inputs work. Problem occurs after adding more inputs.

Expected behavior

It should work.

Your Environment

Version used: 2.1.8
Configuration: see above
Server type and version: Intel NUC i3
Operating System and version: Ubuntu 22
Filters and plugins: see above

Or do I need to start for every single input an own Fluent-bit instance, because it's too much? But Prometheus or Victoria-Agent works with these few scrapes. But I want to have one single log and metrics shipper.

The text was updated successfully, but these errors were encountered:

patrick-stephens · 2023-07-27T20:08:09Z

It looks like Mimir is rejecting it - Loki also used to require in-order writes but that was fixed a while ago by Grafana. If Mimir requires in-order writes then it may be that as you add more inputs there is too much context switching to maintain that order.

Does it work in threaded mode for inputs? threaded true I think is the setting, this dedicates a thread to each input then.

helmut72 · 2023-07-27T21:02:04Z

It looks like Mimir is rejecting it

Yes, but the cause ;)

It looks like I found the cause. Prometheus and Victora-Agent adds automatically two labels from their configuration file, joband instance. Fluent-bit add none of these and probably Mimir don't know which go_memstats_alloc_bytes_total belongs to whom. I used the examples from the Fluent-bit documentation and since I'm a Prometheus noob, I fall into this trap :D

After adding same both labels I get no errors anymore and all my Exporters works like in Prometheus and Victoria-Agent and looks also fine in Grafana.

Because labels can be only added in the output, I need for every scrape input an own output. For example:

[INPUT]
    name      prometheus_scrape
    tag       prometheus.node
    host      127.0.0.1
    port      9100

[OUTPUT]
    name      prometheus_remote_write
    host      192.168.0.1
    match     prometheus.node
    uri       /api/v1/push
    port      9009
    add_label job node
    add_label instance server.example.com

Does it work in threaded mode for inputs? threaded true I think is the setting, this dedicates a thread to each input then.

Don't find this option in the filter when run:

docker run --rm -ti fluent/fluent-bit:latest -i prometheus_scrape --help

But I guess this issue can be closed. Thanks!

patrick-stephens · 2023-07-28T06:40:09Z

Ah, good to know. It might be worth adding the details to the dogs to help.

I wonder if the new processor option in the yaml config might let you add the label you want on input.

helmut72 added the status: waiting-for-triage label Jul 27, 2023

helmut72 closed this as completed Jul 27, 2023

helmut72 mentioned this issue Jul 27, 2023

in_node_exporter_metrics: broken timestamp for node_systemd_system_running #7621

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Few metrics shows err-mimir-sample-out-of-order and doesn't send to Grafana Mimir #7763

Few metrics shows err-mimir-sample-out-of-order and doesn't send to Grafana Mimir #7763

helmut72 commented Jul 27, 2023

patrick-stephens commented Jul 27, 2023

helmut72 commented Jul 27, 2023 •

edited

Loading

patrick-stephens commented Jul 28, 2023

Few metrics shows err-mimir-sample-out-of-order and doesn't send to Grafana Mimir #7763

Few metrics shows err-mimir-sample-out-of-order and doesn't send to Grafana Mimir #7763

Comments

helmut72 commented Jul 27, 2023

Bug Report

patrick-stephens commented Jul 27, 2023

helmut72 commented Jul 27, 2023 • edited Loading

patrick-stephens commented Jul 28, 2023

helmut72 commented Jul 27, 2023 •

edited

Loading