-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CPU usage goes to 100% when using custom multiline parsing #9337
Comments
We are having the same issue. We are using multi-line parser for java traces. In our case we are not using any Lua filter but similar multi-line custom parser that OP defined above with some grep filter to Exclude certain logs. Our CPU spike to 100% after 7-8 hours and memory also grows significantly and then fluent-bit stops sending logs to output. When this happen we notice a lot of
This started happening when we upgrade our fluent-bit version running on EKS cluster from 1.9.9 to 3.1.7. For now we reverted back to 1.9.9. |
I'm experiencing the same issue. I'm reading Docker logs in Kubernetes, sending them through a multiline filter, followed by a Kubernetes filter, and then routing them to a Kafka topic. The log volume isn't particularly high. The CPU spikes to 100% after about 30 minutes, although this varies depending on the log volume. The spike is accompanied by an increase in memory consumption, which roughly corresponds to the size of the tail buffers. On some nodes, Fluentbit spikes to 100% CPU for a few minutes before recovering. However, on other nodes, the CPU spike persists, followed by a "could not enqueue records into the ring buffer" error message, which does not resolve on its own. Only restarting the pod seems to help. Additionally, in these cases, we observe a gap in the logs, particularly when the log files are rotated. |
I wanted to provide an update: I was able to resolve the issue by switching to the YAML configuration and using the multiline filter directly within the tail plugin. All my lua filters are configured as processors for the tail input. |
This issue has happened just now, on the newest version with the patch and with the YAML configuration. It's not as frequent as before (once a week). |
Bug Report
Describe the bug
After enabling multiline parsing with Fluentbit in an EKS cluster with Fluentbit, CPU usage of fluentbit pods goes to 100% of the limits after some hours (100m, but tried with 300m as well). Without multiline parsing CPU never goes above 8-9m. Even though multiline parsing works well for some hours, then it stops sending the logs, as CPU is throttled and, eventually, RAM usage reaches any limit you define for the pod as well, thus killing the pod through OOM error code.
The way logs should be parsed is simple:
start_state
: any line with a timestamp (this is, including[]
somewhere in the log) must be considered a new log.cont_state
: any line not including a timestamp (this is, not including a starting[
) must be considered part of the last log.To Reproduce
Expected behavior
Multiline is working fine, but eventually the pods will stop collecting the logs after reaching 100% of the CPU and/or memory limits.
Screenshots
Your Environment
K8s:
EKS 1.29 (AWS)
EKS node image:
Bottlerocket OS 1.20.1 (aws-k8s-1.29)
Fluentbit image:
cr.fluentbit.io/fluent/fluent-bit:3.0.4
Helm chart:
helm.sh/chart=fluent-bit-0.46.7
And the luascript referenced in the fluentbit config, which doesn't seem to affect:
The text was updated successfully, but these errors were encountered: