-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exceeding the maximum doc count of a shard fails the shard #51136
Comments
Pinging @elastic/es-distributed (:Distributed/Engine) |
While what the implementation is doing here is very harsh, it's currently the only way to guarantee that sequence numbers are not leaked. This is a particularly bad kind of event, and as primary and replicas can have different doc count (due to deletes), can hit either copy at an arbitrary point. On the primary, we could start to reject requests at an earlier point (based on some kind of pre-flight check, before generating the sequence number). On the replica, we have no other choice than to fail the copy, otherwise it will be out of sync with the primary. |
I have seen this issue a cloud cluster In this case , cluster kept flipping to RED as the index was sent more and more data aimed at this index. the fact that the index name that hits the issue functionbeat-7.5.0-2020.01.30-000001 suggest we may need better prevention of this issue within ILM policies. Error |
I've opened #63273. |
Today indexing to a shard with 2147483519 documents will fail that shard. We should check the number of documents and reject the write requests instead. Closes #51136
Today indexing to a shard with 2147483519 documents will fail that shard. We should check the number of documents and reject the write requests instead. Closes elastic#51136
Today indexing to a shard with 2147483519 documents will fail that shard. We should check the number of documents and reject the write requests instead. Closes elastic#51136
Today indexing to a shard with 2147483519 documents will fail that shard. We should check the number of documents and reject the write requests instead. Closes #51136
Today indexing to a shard with 2147483519 documents will fail that shard. We should check the number of documents and reject the write requests instead. Closes #51136
Today (7.5.0) if we try and index a document into a shard that already contains 2147483519 documents then it is rejected by Lucene and a no-op is written to the translog to record this. However, since #30226 we also try and record no-ops themselves as documents in Lucene; the index is already full so we fail to add the tombstone too. The failure to add this tombstone is fatal to the shard:
However, we immediately restart the shard:
I think this is ok - the operation fails before it makes it to the translog so there's nothing to replay - but it would be good to confirm that there's no risk we do something bad (e.g. leak a seqno) here.
The text was updated successfully, but these errors were encountered: