Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Few metrics shows err-mimir-sample-out-of-order and doesn't send to Grafana Mimir #7763

Closed
helmut72 opened this issue Jul 27, 2023 · 3 comments

Comments

@helmut72
Copy link

Bug Report

Describe the bug

When adding one single scraper, Fluent-bit works fine. Also 2 scrapers looks like it works. When adding 3 or more scrapers, Fluent-bit doesn't send metrics to Mimir anymore.

To Reproduce

Example log message if applicable:

[2023/07/27 21:21:17] [debug] [upstream] KA connection #65 to 192.168.0.1:9009 is now available
[2023/07/27 21:21:17] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] http_post result FLB_ERROR
[2023/07/27 21:21:17] [debug] [out flush] cb_destroy coro_id=232
[2023/07/27 21:21:17] [debug] [task] destroy task=0x7ff028e81040 (task_id=4)
[2023/07/27 21:21:26] [debug] [http_client] not using http_proxy for header
[2023/07/27 21:21:26] [debug] [http_client] not using http_proxy for header
[2023/07/27 21:21:26] [debug] [http_client] not using http_proxy for header
[2023/07/27 21:21:26] [debug] [http_client] not using http_proxy for header
[2023/07/27 21:21:26] [debug] [http_client] not using http_proxy for header
[2023/07/27 21:21:26] [debug] [input chunk] update output instances with new chunk size diff=24769, records=0, input=prometheus_scrape.3
[2023/07/27 21:21:26] [debug] [input chunk] update output instances with new chunk size diff=19869, records=0, input=prometheus_scrape.0
[2023/07/27 21:21:26] [debug] [input chunk] update output instances with new chunk size diff=9679, records=0, input=prometheus_scrape.2
[2023/07/27 21:21:26] [debug] [input coro] destroy coro_id=93
[2023/07/27 21:21:26] [debug] [input coro] destroy coro_id=93
[2023/07/27 21:21:26] [debug] [input coro] destroy coro_id=93
[2023/07/27 21:21:26] [debug] [input chunk] update output instances with new chunk size diff=69031, records=0, input=prometheus_scrape.4
[2023/07/27 21:21:26] [debug] [input coro] destroy coro_id=93
[2023/07/27 21:21:26] [debug] [input chunk] update output instances with new chunk size diff=78646, records=0, input=prometheus_scrape.1
[2023/07/27 21:21:26] [debug] [input coro] destroy coro_id=93
[2023/07/27 21:21:27] [debug] [task] created task=0x7ff028e81040 id=0 OK
[2023/07/27 21:21:27] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] task_id=0 assigned to thread #1
[2023/07/27 21:21:27] [debug] [task] created task=0x7ff028e80e60 id=1 OK
[2023/07/27 21:21:27] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] task_id=1 assigned to thread #0
[2023/07/27 21:21:27] [debug] [task] created task=0x7ff028e80f00 id=2 OK
[2023/07/27 21:21:27] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] task_id=2 assigned to thread #1
[2023/07/27 21:21:27] [debug] [task] created task=0x7ff028e80fa0 id=3 OK
[2023/07/27 21:21:27] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] task_id=3 assigned to thread #0
[2023/07/27 21:21:27] [debug] [task] created task=0x7ff028e81180 id=4 OK
[2023/07/27 21:21:27] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] task_id=4 assigned to thread #1
[2023/07/27 21:21:27] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] cmetrics msgpack size: 19869
[2023/07/27 21:21:27] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] cmetrics msgpack size: 78646
[2023/07/27 21:21:27] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] cmetric_id=0 decoded 0-19869 payload_size=49627
[2023/07/27 21:21:27] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] final payload size: 49627
[2023/07/27 21:21:27] [debug] [upstream] KA connection #69 to 192.168.0.1:9009 has been assigned (recycled)
[2023/07/27 21:21:27] [debug] [http_client] not using http_proxy for header
[2023/07/27 21:21:27] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] cmetrics msgpack size: 9679
[2023/07/27 21:21:27] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] cmetric_id=0 decoded 0-9679 payload_size=5043
[2023/07/27 21:21:27] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] final payload size: 5043
[2023/07/27 21:21:27] [debug] [upstream] KA connection #67 to 192.168.0.1:9009 has been assigned (recycled)
[2023/07/27 21:21:27] [debug] [http_client] not using http_proxy for header
[2023/07/27 21:21:27] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] cmetrics msgpack size: 69031
[2023/07/27 21:21:27] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] cmetric_id=0 decoded 0-78646 payload_size=56468
[2023/07/27 21:21:27] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] final payload size: 56468
[2023/07/27 21:21:27] [debug] [upstream] KA connection #64 to 192.168.0.1:9009 has been assigned (recycled)
[2023/07/27 21:21:27] [debug] [http_client] not using http_proxy for header
[2023/07/27 21:21:27] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] cmetrics msgpack size: 24769
[2023/07/27 21:21:27] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] cmetric_id=0 decoded 0-24769 payload_size=8001
[2023/07/27 21:21:27] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] final payload size: 8001
[2023/07/27 21:21:27] [debug] [upstream] KA connection #68 to 192.168.0.1:9009 has been assigned (recycled)
[2023/07/27 21:21:27] [debug] [http_client] not using http_proxy for header
[2023/07/27 21:21:27] [error] [output:prometheus_remote_write:prometheus_remote_write.0] 192.168.0.1:9009, HTTP status=400
failed pushing to ingester: user=anonymous: the sample has been rejected because another sample with a more recent timestamp has already been ingested and out-of-order samples are not allowed (err-mimir-sample-out-of-order). The affected sample has timestamp 2023-07-27T19:21:26.257Z and is from series {__name__="go_memstats_alloc_bytes_total"}

Fluent-bit conf:

[SERVICE]
    http_server  off
    log_level    debug
    #log_level    error

[INPUT]
    name            prometheus_scrape
    tag             prometheus.docker
    host            127.0.0.1
    port            9323

[INPUT]
    name            prometheus_scrape
    tag             prometheus.node
    host            127.0.0.1
    port            9100

[INPUT]
    name            prometheus_scrape
    tag             prometheus.fail2ban
    host            127.0.0.1
    port            9191

[INPUT]
    name            prometheus_scrape
    tag             prometheus.sftpgo
    host            127.0.0.1
    port            10000

[INPUT]
    name            prometheus_scrape
    tag             prometheus.caddy
    host            127.0.0.1
    port            2019

[OUTPUT]
    name        prometheus_remote_write
    host        192.168.0.1
    match       prometheus.*
    uri         /api/v1/push
    port        9009
    tls         off
  • Steps to reproduce the problem:

One or two of these inputs work. Problem occurs after adding more inputs.

Expected behavior

It should work.

Your Environment

  • Version used: 2.1.8
  • Configuration: see above
  • Server type and version: Intel NUC i3
  • Operating System and version: Ubuntu 22
  • Filters and plugins: see above

Or do I need to start for every single input an own Fluent-bit instance, because it's too much? But Prometheus or Victoria-Agent works with these few scrapes. But I want to have one single log and metrics shipper.

@patrick-stephens
Copy link
Contributor

It looks like Mimir is rejecting it - Loki also used to require in-order writes but that was fixed a while ago by Grafana. If Mimir requires in-order writes then it may be that as you add more inputs there is too much context switching to maintain that order.

Does it work in threaded mode for inputs? threaded true I think is the setting, this dedicates a thread to each input then.

@helmut72
Copy link
Author

helmut72 commented Jul 27, 2023

It looks like Mimir is rejecting it

Yes, but the cause ;)

It looks like I found the cause. Prometheus and Victora-Agent adds automatically two labels from their configuration file, joband instance. Fluent-bit add none of these and probably Mimir don't know which go_memstats_alloc_bytes_total belongs to whom. I used the examples from the Fluent-bit documentation and since I'm a Prometheus noob, I fall into this trap :D

After adding same both labels I get no errors anymore and all my Exporters works like in Prometheus and Victoria-Agent and looks also fine in Grafana.

Because labels can be only added in the output, I need for every scrape input an own output. For example:

[INPUT]
    name      prometheus_scrape
    tag       prometheus.node
    host      127.0.0.1
    port      9100

[OUTPUT]
    name      prometheus_remote_write
    host      192.168.0.1
    match     prometheus.node
    uri       /api/v1/push
    port      9009
    add_label job node
    add_label instance server.example.com

Does it work in threaded mode for inputs? threaded true I think is the setting, this dedicates a thread to each input then.

Don't find this option in the filter when run:

docker run --rm -ti fluent/fluent-bit:latest -i prometheus_scrape --help

But I guess this issue can be closed. Thanks!

@patrick-stephens
Copy link
Contributor

Ah, good to know. It might be worth adding the details to the dogs to help.

I wonder if the new processor option in the yaml config might let you add the label you want on input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants