-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Too many open files #5460
Comments
It looks like it is all socket comms so are there a load of open connections in an error or wait state? What does the Fluent Bit log show, any connection issues? |
this keeps showing up in the logs at the same time:
how can I see if they are in an error or wait state? |
Netstat or similar to see what connections you have, e.g. https://transang.me/check-for-listening-ports-in-linux/ Just wondering if failed or failing connections are not being cleaned up. |
OK.. netstat shows td-agent-bit binding to the ports in our config:
I'm not very familiar with how fluentbit is handling these connections, but I have identified that it's the last input, port 5174, that is causing the problem. For some reason I get a bunch of open files per connection:
Is this common? If I sort it out on unique client hostnames, I get the following numbers:
I don't know if I'm barking up the wrong tree here. |
@patrick-stephens do you have any idea what to do here? |
Does netstat show any other connections in WAIT states, e.g. TIME_WAIT or CLOSE_WAIT? |
Nope they are all in LISTEN state. |
Which user is running the agent? Can you share output of the following commands? grep "Max open files" /proc/<fluent-agent-id>/limits
ulimit -Sn
ulimit -Hn
systemctl show -p DefaultLimitNOFILE
systemctl show <agent-service-name> | grep LimitNOFILE |
You need to increase the ulimit size of fluent-bit service. Steps: 1.Copy the fluent-bit service file under /etc/systemd/system directory:
2.Insert the LimitNOFILE=20000(this number depends on your requires) option into the /etc/systemd/system/fluent-bit.service file on the service section like below: [Unit] [Service] 3.Restart the fluent-bit service: |
Yes, thanks guys, that was it. I forgot about the file limit in the service itself. It has been running for 3 days now without issues so it looks like it is working better now. I will close this. |
Bug Report
My td-agent-bit is continuously using more and more file descriptors and eventually stops working. I have increased the file limits on the system:
After a couple of hours of running the agent, it starts to produce these log messages over and over:
Even though the limits above aren't reached:
However, the number when it stops is always somewhere around that (13000-14000). Is it normal that it is using this many files? I get the feeling that it is just growing and growing and old ones are not closed.
To Reproduce
This is my config:
When the agent is stuck, I can get it running again by restarting the service. When it starts up again it is using much less file descriptors but the number is increasing:
Expected behavior
td-agent-bit should not stop working.
Your Environment
Additional context
Is there some other file limit that I'm not aware of? Is it normal that it is using this many open files? It feels like they are not closed properly.
The text was updated successfully, but these errors were encountered: