From cee11a089f6e8325318d6840134b45994b373b41 Mon Sep 17 00:00:00 2001 From: Wesley Pettit Date: Mon, 10 Apr 2023 23:20:34 -0700 Subject: [PATCH] debugging: small improvement to log loss and overlimit discussion Signed-off-by: Wesley Pettit --- troubleshooting/debugging.md | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/troubleshooting/debugging.md b/troubleshooting/debugging.md index 2f34e3a87..38b81d99b 100644 --- a/troubleshooting/debugging.md +++ b/troubleshooting/debugging.md @@ -103,7 +103,9 @@ Even if you see this message, you still have not lost logs yet. Since it will re [2022/02/16 20:11:36] [ warn] [engine] chunk '1-1645042288.260516436.flb' cannot be retried: task_id=0, input=tcp.3 > output=cloudwatch_logs.1 ``` -When you see this message, you have lost logs. The other case that indicates log loss is when an input is paused, which is covered in the [overlimit error section](#overlimit-warnings). +When you see this message, you have lost logs. + +The main other case that indicates log loss is when an input is paused, which is covered in the [overlimit error section](#overlimit-warnings). Please also review our [Log Loss Summary: Common Causes](#log-loss-summary-common-causes). #### Common Network Errors @@ -201,6 +203,10 @@ The `storage buf overlimit` occurs when the number of in memory ("up") chunks ex The `mem buf overlimit` occurs when the input has exceeded the configured `Mem_Buf_Limit` and `storage.type memory` is configured. +When the input is able to receive logs again, you will see one of the `resume` messages above. + +With some inputs, an overlimit warning indicates that you are losing logs- new logs will not be ingested. This is the case with most inputs that stream in data ([forward](https://docs.fluentbit.io/manual/pipeline/inputs/forward) and [TCP](https://docs.fluentbit.io/manual/pipeline/inputs/tcp), for example). If you use [ECS FireLens](https://aws.amazon.com/blogs/containers/under-the-hood-firelens-for-amazon-ecs-tasks/) with Fluent Bit, then the stdout/stderr log input is a forward input and an overlimit warning means that new logs will not be ingested and will be lost. The exception to this is the [tail](https://docs.fluentbit.io/manual/pipeline/inputs/tail) input, which can safely pause and resume without losing logs because it tracks its file offset. When it resumes, it can pick back up reading the file at the last offset (assuming the file was not deleted). + #### invalid JSON message Users who followed [this tutorial](https://github.com/aws-samples/amazon-ecs-firelens-examples/tree/mainline/examples/fluent-bit/ecs-log-collection) or similar to send logs to the TCP input often see message like: