-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filebeat discarding data when new index cannot be created (error 400 returned by ES) due to max shards allocated #70349
Comments
Pinging @elastic/es-distributed (Team:Distributed) |
There are a couple of considerations here:
|
We discussed this as a team and our conclusion and comments are:
We decided we want to involve the @elastic/es-core-features team too to get their input, so adding that label. |
Thanks for pinging us Henning, we discussed this as a team also
|
We (the @elastic/es-distributed team) discussed this again today. The only remaining question is whether to use 429 or 503; we didn't have strong opinions either way so @dakrone's previously-mentioned slight preference for a 503 wins. |
Elasticsearch version (
bin/elasticsearch --version
): 7.11.xElasticsearch is returning a
400 error response
when indexing requests are coming and there's a temporary problem in the creation of a new index:This could happen for example when new daily indices are created at 00:00:00 (confirmed) or could potentially happen with ILM rollover (not checked).
This is making the indexer (filebeat in this case) to discard all the data during the problem, as in general the integration agreements with beats team is (I might be wrong about this):
Considering that having reached the shard limit should be considered a temporary error that should be solved by the administrator (in this case our user had lost temporary one data node and replicas had been migrated to the existing nodes reaching this limit), Elasticsearch should probably send an error code to prevent data loss from clients side (Filebeat in this case).
A similar approach was followed when disk watermark levels are reached and shards are moved to read_only status (we provide a response code to ensure our client will keep retrying until the issue is gone).
The text was updated successfully, but these errors were encountered: