Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tail.0 paused (mem buf overlimit) #1903

Closed
Mosibi opened this issue Jan 23, 2020 · 15 comments
Closed

tail.0 paused (mem buf overlimit) #1903

Mosibi opened this issue Jan 23, 2020 · 15 comments

Comments

@Mosibi
Copy link

Mosibi commented Jan 23, 2020

Bug Report

Describe the bug
We notice sometimes that logging is not processed anymore and putting FB in debug logging, revealed that FB sometimes pauses the logging input with the message "tail.0 paused (mem buf overlimit)" and continues when possible and then show the message "tail.0 resume (mem buf overlimit)"

In some situations, FB does not restart ingesting log at all and, the 'resume' message is not show, so the assumption is that somehow the buffer cannot be cleared.

Are there known situation where/when this can happen and it is possible to get metrics that show the current usage of mem_buf_limit?

To Reproduce
We are trying to figure out how to reproduce it.

Expected behavior
When logging is paused because of a buffer that is over limit, I want to see it in the regular log. Now we had to put FB in debug mode to see this message.

Screenshots

Your Environment

  • Version used: 1.3.5
  • Configuration:
  • Kubernetes 1.15
  • Operating System and version: RHEL 7.7
  • Filters and plugins:
@PettitWesley
Copy link
Contributor

PettitWesley commented Jan 25, 2020

@Mosibi Check out the recommendation here: #1768 (comment)

I'm not sure if increasing mem_buf_limit will help you, but it's worth a try.

@Mosibi
Copy link
Author

Mosibi commented Jan 25, 2020

@Mosibi Check out the recommendation here: #1768 (comment)

I'm not sure if increasing mem_buf_limit will help you, but it's worth a try.

Thanks for this pointer!

Do you, or anybody else, know how I can see the current usage of the memory buffer. I would like for example to plot those in Grafana and alert when it's 100% in use.

And are the any comments on logging the 'overlimit' messages on a warning level or another level that show the message when fluent-bit is configured to log on level info? Also because the 'overlimit' message can easily be missed when fluent-bit is debug level.

@PettitWesley
Copy link
Contributor

Do you, or anybody else, know how I can see the current usage of the memory buffer. I would like for example to plot those in Grafana and alert when it's 100% in use.

AFAIK that is not possible. It's a very good idea though; do you want to open a separate issue for that feature request?

And are the any comments on logging the 'overlimit' messages on a warning level

I think this is also a good idea.

@edsiper
Copy link
Member

edsiper commented Feb 5, 2020

FYI: #1909 has been merged

@edsiper
Copy link
Member

edsiper commented Feb 7, 2020

Fixed by #1909

@edsiper edsiper closed this as completed Feb 7, 2020
@abh
Copy link

abh commented May 7, 2020

How did changing the log level of the message change fluent-bit getting stuck?

@edsiper
Copy link
Member

edsiper commented May 19, 2020

If Fluent Bit don't resume work is because is not able to flush the data (it respects your mem_buf_limit)

@lalithvaka
Copy link

Thanks for this pointer!

Hi @Mosibi, am seeing warning messages in our environment like the following with FB version 1.15.3. Our Mem_Buf_Limit 5MB. Did it help increasing this buffer to fix you issues?

[2020/09/09 21:27:32] [ warn] [input] tail.0 paused (mem buf overlimit)
[2020/09/09 21:27:32] [ info] [input] tail.0 resume (mem buf overlimit)
[2020/09/09 21:27:33] [ warn] [input] tail.0 paused (mem buf overlimit)
[2020/09/09 21:27:33] [ info] [output:http:http.0] elm-logs-dev.appl.kp.org:31763, HTTP status=200

@Mosibi
Copy link
Author

Mosibi commented Sep 10, 2020

Hi @Mosibi, am seeing warning messages in our environment like the following with FB version 1.15.3. Our Mem_Buf_Limit 5MB. Did it help increasing this buffer to fix you issues?

Yes it did, eventually I had it set to 512MB, just to have a upper limit we would not reach very soon. At the moment I do not use fluent-bit, because of other reasons I had to move to fluentd in that specific environment

@mishrahrishikesh
Copy link

@Mosibi Thanks it worked for me

@zoobab
Copy link

zoobab commented Jun 23, 2021

I hit a similar problem by creating some large log files (100MB) with FB v1.7.9 released 4 days ago, and FB would stop processing never log files.

@AkaiNoCat
Copy link

@lackhoa
Copy link

lackhoa commented Sep 16, 2021

If Fluent Bit don't resume work is because is not able to flush the data (it respects your mem_buf_limit)

I don't think this was the reason at all in my case, since the tail input plugin was stuck for a day. The output backend (Loki) was still working just fine. The buffer limit was only 5MB and there was no error message about disconnection.
The OP's suggestion is good: let us monitor the size of the buffer, so we can debug the issue.
Please reopen this issue!

@krishnakc1
Copy link

In my case, the fluent bit would not resume when it hits mem buf limit warning and followed by a pause. It precedes with "_failed to flush chunk '1-1652117073.586923682.flb', retry in 8 seconds: task_id=34, input=tail.0 > output=es.0 (out_id=0)" error_.
I run it on kubernetes nodes and only a node or two always hit with this error. Sometimes restart helps but mostly it does not. 25 MB mem_buf_limit is already allowed. Increasing it may not help without finding what is causing the problem. Does anyknow where to start looking?

@chenyg0911
Copy link

same problem when run inside k8s, and fluent-bit pause to collect logs. fluent-bit v1.9.8.

[2022/10/28 04:20:29] [ warn] [input] tail.0 paused (mem buf overlimit)
[2022/10/28 04:20:30] [ info] [input] tail.0 resume (mem buf overlimit)
[2022/10/28 04:20:30] [ warn] [input] tail.0 paused (mem buf overlimit)
[2022/10/28 04:20:31] [ info] [input] tail.0 resume (mem buf overlimit)
[2022/10/28 04:20:31] [ warn] [input] tail.0 paused (mem buf overlimit)

the default buf is setting to 5MB,.
I've set it to 100M, not sure if it can work fine.
config :

    [INPUT]
        Name tail
        Path /var/log/containers/*.log
        multiline.parser docker, cri
        Tag kube.*
        Mem_Buf_Limit 100MB
        Skip_Long_Lines On

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests