-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sporadic log error once switch to cloudwatch_logs #135
Comments
I would ignore the SSL errors, I get those occasionally too, Fluent Bit unfortunately has a custom HTTP client, and I think for these connection dropped errors other clients would just be silent and retry- I haven't seen evidence that occasional error messages like that are a sign of anything actually abnormal going on. The throughput except though is something you should fix. Check the logs some more, there should be an info message every flush that tells you how many events were sent in that batch. How many logs are you sending per fluent bit instance- what is your rough rate? How many Fluent Bit instances do you have an do they send to unique log streams of the same log stream? Based on your config it looks like they might all send to the same log stream. CloudWatch has ingestion limits per log stream- 5 requests per second. |
Thanks for your quick response. Here are the info message I found around that error log.
It looks like that after retry, it works. And I have this behavior once every 20 mins or so. I'm sending logs to various log streams based on Any suggestion on how to improve this? Or should the log level be adjusted to warning? Thanks! |
The main advice to prevent throttling issues is to send to a unique log stream per task. If you are already doing that, then there is one more recommendation with lesser impact that I can give: Experiment with increasing the
https://docs.fluentbit.io/manual/administration/configuring-fluent-bit/configuration-file |
Thanks. I'll play with the flush parameter. For now I used Does increase flush time period also increase the possibility of log loss? |
@BowenRaymone It does slightly increase the risk, yes. If Fluent Bit suddenly quits, there will be more logs stored in the buffer- if it crashes then more will be lost, if it exits gracefully there's greater risk it won't be able to send everything in time. Realistically graceful shutdown is the most common case, which is 30 seconds in ECS- which is realistically enough time to send a lot of logs, as shown in that post. And with the new |
I see. That make sense. I'll try to increase the flush time and see if it resolves the throttling issue. Thanks for your advice! |
Hello! I was experimenting a very similar issue to yours and I'd like to know if the flush time increase did the trick? Thanks. |
Worked for me |
Hi All, I'm facing similar issue with forwarding my application logs to cloudwatch. It does send logs but after some time the connection gets lost between container and cloudwatch. Flush is already set to 10 in the fluentbit.conf file. Is there anything I should check?? [output:cloudwatch_logs:cloudwatch_logs.3] Sending log events to log stream from-us-east-1b-xray-indexer-service.log Appreciate any help on this!! Thanks!! |
@Sangeethavisa Please open a new issue: https://github.com/aws/aws-for-fluent-bit/blob/mainline/.github/ISSUE_TEMPLATE/issue.md Also please check out our debugging guide first: https://github.com/aws/aws-for-fluent-bit/blob/mainline/troubleshooting/debugging.md |
Hi,
I recently switched to use cloudwatch_logs in my project, and then found out that I sporadically get the following error in FireLens container logs.
My hunch is that this doesn't matter as my containers keeps running anyway, and the output plugin will retry anyway. But I want to make sure about it and see if I need to tune some parameters. Also, the fact that the log level is [error] is concerning to me.
For context, I get the image from
906394416424.dkr.ecr.us-east-1.amazonaws.com/aws-for-fluent-bit:latest
and my config looks likeThanks!
The text was updated successfully, but these errors were encountered: