-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
promtail-linux-amd64.gz 0.4.0 full of errors after upgrade #1199
Comments
Hey @Alexvianet, there was additional functionality added to Loki supporting additional limits. These mostly apply when running Loki in a multi-tennant mode but are still enforced in any scenario. I believe you are seeing the difference between promtail 0.3 and 0.4 because of some batching changes made in promtail 0.4, where it will send larger batches. Regardless, the limits are set in Loki and the description of the configs can be found here I believe you should be able to add something like I will update the release notes and changelog tomorrow to call better attention to this change as I expect others will see this also, thanks for reporting! Let us know if changing this fixes the problem for you. |
i have updated loki_config:
promtail_config:
After promtail restart i get:
|
The error limiting is gone, looks like promtail is trying to send old logs that can happen is the position has not been save correctly, are you still seeing this permanently ? |
my /tmp/positions.yaml was not changed
it's showing after each promtail restart and then silence:
then I remove my /tmp/positions.yaml and restart promtail and still get:
|
It's possible this is normal behavior, if promtail re-sends a line (or a line older than the newest for a given stream) Loki will reject the line with that error. Deleting the positions file would definitely cause some resent files and trigger this error. I would however expect those errors would go away over time as promtail starts sending newer logs which Loki has not seen yet. |
I got new logs into grafana, and looks like functional is working, the problem only in pormtail spamming error logs after each restart, on v0.3.0 all was ok, I got newer logs which Loki has not seen yet but after pormtail restart, nothing changed (each restart error spam), also promtail do not have any info process logs, in grafana explore live I can see new alerts but promtail log said nothing about journal logs only about static nginx logs. |
We are having same issue. What is the highest value we can set for "ingestion_rate"? |
ingestion_rate is no longer correct. Use ingestion_rate_mb: |
We were also hitting the ingestion rate of loki when using promtail to ship logs. I was wondering if promtail actually tries to resend the batch once it gets a 429 HTTP status code from loki (saying that the ingestion rate limit was exceeded). If I interpret this line correctly then promtail only tries to resend on 5xx HTTP errors and connection errors. Is there a particular reason for this? |
I just hit this with fluentd using the loki plugin also. |
This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions. |
You are correct, rate limited logs are dropped. The reasoning is that if you are hitting the rate limit, attempting to retry those logs would never be successful and would just dig the hole deeper. (New logs still coming in while you retry failed logs would only add to the problem) |
Also sorry for the slow reply, the stale bot nudged this and realized I never responded. I'm going to close this issue however, as I believe the initial issue was addressed/responded to. |
Chiming in to say that while this is a possible scenario it also is a worst-case scenario. When the other end tells you to back off (429) then there are at least two rather different ways to handle that situation: 1) retry later or 2) drop data immediately. Deciding between both is easy when you know a lot about the context, specifically about the (expected) time evolution of the data push rate. When you know that pressure is going to decrease at some point in the future then (1) wise. When you know that the push rate will remain as-is or even further increase then (2) is the right choice. In other words, if the current log push rate peaks for only a "short time window" (short relative to local buffer size) and drops again thereafter then one can get away with not dropping data by backing off when seeing 429 responses, being able to send it when there is some breathing room again. This is part of a larger optimization problem and of course one needs to know exactly what to optimize for. In many cases dropping data comes with a large penalty in that optimization problem, and therefore the strategy that does not imply dropping data is very much preferable. There is no one-size-fits all default strategy, one must decide between (1) and (2) based on knowledge about the wider context. |
This behavior was changed as of the 1.4.0 release, all the clients
maintained in the Loki repo will now retry 429 error messages.
…On Thu, Apr 9, 2020, 6:39 AM Jan-Philip Gehrcke ***@***.***> wrote:
You are correct, rate limited logs are dropped. The reasoning is that if
you are hitting the rate limit, attempting to retry those logs would never
be successful and would just dig the hole deeper. (New logs still coming in
while you retry failed logs would only add to the problem)
Chiming in to say that while this is a possible scenario it also is a
worst-case scenario.
When the other end tells you to back off (429) then there are at least two
rather different ways to handle that situation: 1) retry later or 2) drop
data immediately. Deciding between both is easy when you know a lot about
the context, specifically about the (expected) time evolution of the data
push rate. When you know that pressure is going to decrease at some point
in the future then (1) wise. When you know that the push rate will remain
as-is or even further increase then (2) is the right choice. In other
words, if the current log push rate peaks for only a "short time window"
(short relative to local buffer size) and drops again thereafter then one
can get away with *not* dropping data by *backing off* when seeing 429
responses, being able to send it when there is some breathing room again.
This is part of a larger optimization problem and of course one needs to
know exactly what to optimize for. In many cases dropping data comes with a
large penalty in that optimization problem, and therefore the strategy that
does *not* imply dropping data is very much preferable.
There is no one-size-fits all default strategy, one must decide between
(1) and (2) based on knowledge about the wider context.
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#1199 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACO2RK56D7Q7E4YKAHOR3HLRLWQXXANCNFSM4JE4OJ6A>
.
|
Describe the bug
After promtail update to 0.4.0 binary (Centos 7 VM)
promtail log full of:
To Reproduce
Steps to reproduce the behavior:
loki_config:
promtail_config:
After downgrade promtail to 0.3.0 with the same configuration, everything looks ok:
Expected behavior
Make 0.4.0 promtail works well without errors
Environment:
The text was updated successfully, but these errors were encountered: