-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Ingester out-of-order error for faster troubleshooting #963
Comments
If possible it would be nice to log how many entries succeeded since it sounds like we will be extracting that info anyway to return to the client |
From my perspective, it's desirable a push is not atomic, as far as loki gives back enough information to the client to identify which log entries have been rejected. In order to fix this, we need to enrich the push response in case of error. On the client side ( An option is to switch to a structured error format (JSON) with the following structure:
Given we do expect an ingestion error not to be the normal case, we do return the most expressive error message we can (including the full failed entries), in order to give to the client enough data to easily spot the issue. Such change breaks backward compatibility. We can both support old and new response format in What's your take? |
@pracucci not sure if return that json struct would result in breaking backwards compatibility. HTTP status is still the same (400). Error body not significant for promtail client : loki/pkg/promtail/client/client.go Line 253 in abe96fc
|
@wardbekker In the option describe above, the new version of |
Is your feature request related to a problem? Please describe.
Out-of-order 400 http is a very common (user) error when starting out with Loki. Due too mislabeling, sending old logs, or the logging client is indeed sending out-of-order events (happened to me when creating loki_logger for elixir). Current HTTP Status 400 msg entry only logs e.g. "entry out of order for stream: {filename="/var/log/sntpc.log", job="varlogs"}", making it hard to understand what entry/entries are causing this. And ultimately it makes debugging/troubleshooting harder.
Additionally, a HTTP POST with stream entries is not atomic. So, you don't know how many entries failed, aka. just a blip or something more fundamental goes wrong.
Describe the solution you'd like
Include additional details to the "out of order" error:
Also "final error sending batch" is a bit cryptic msg. Perhaps change it to something that describes Loki's ingester behaviour e.g; "Batch post of entries not successful. Some and/or all entries might not be committed to Loki"
I feel this will help the user to get up and running with Loki faster.
Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered: