From f26addc03c63c0c32ceba5c8538630d4d3c26252 Mon Sep 17 00:00:00 2001 From: Yannick Welsch Date: Thu, 25 Apr 2019 13:09:44 +0200 Subject: [PATCH] Update resiliency status page for 7.0 (#41522) Marks 2 items as done: - Documents indexed during a network partition cannot be uniquely identified - Replicas can fall out of sync when a primary shard fails --- docs/resiliency/index.asciidoc | 71 ++++++++++++++++++++-------------- 1 file changed, 41 insertions(+), 30 deletions(-) diff --git a/docs/resiliency/index.asciidoc b/docs/resiliency/index.asciidoc index 71a87ef57b424..70533b4e8fd86 100644 --- a/docs/resiliency/index.asciidoc +++ b/docs/resiliency/index.asciidoc @@ -21,7 +21,7 @@ been made and current in-progress work. We’ve also listed some historical improvements throughout this page to provide the full context. If you’re interested in more on how we approach ensuring resiliency in -Elasticsearch, you may be interested in Igor Motov’s recent talk +Elasticsearch, you may be interested in Igor Motov’s talk http://www.elastic.co/videos/improving-elasticsearch-resiliency[Improving Elasticsearch Resiliency]. You may also be interested in our blog post @@ -102,20 +102,6 @@ space. The following issues have been identified: Other safeguards are tracked in the meta-issue {GIT}11511[#11511]. -[float] -=== The _version field may not uniquely identify document content during a network partition (STATUS: ONGOING) - -When a primary has been partitioned away from the cluster there is a short period of time until it detects this. During that time it will continue -indexing writes locally, thereby updating document versions. When it tries to replicate the operation, however, it will discover that it is -partitioned away. It won't acknowledge the write and will wait until the partition is resolved to negotiate with the master on how to proceed. -The master will decide to either fail any replicas which failed to index the operations on the primary or tell the primary that it has to -step down because a new primary has been chosen in the meantime. Since the old primary has already written documents, clients may already have read from -the old primary before it shuts itself down. The version numbers of these reads may not be unique if the new primary has already accepted -writes for the same document (see {GIT}19269[#19269]). - -We are currently implementing Sequence numbers {GIT}10708[#10708] which better track primary changes. Sequence numbers thus provide a basis -for uniquely identifying writes even in the presence of network partitions and will replace `_version` in operations that require this. - [float] === Relocating shards omitted by reporting infrastructure (STATUS: ONGOING) @@ -136,24 +122,49 @@ We have ported the known scenarios in the Jepsen blogs that check loss of acknow The new tests are run continuously in our testing farm and are passing. We are also working on running Jepsen independently to verify that no failures are found. -[float] -=== Replicas can fall out of sync when a primary shard fails (STATUS: ONGOING) +== Completed -When a primary shard fails, a replica shard will be promoted to be the -primary shard. If there is more than one replica shard, it is possible -for the remaining replicas to be out of sync with the new primary -shard. This is caused by operations that were in-flight when the primary -shard failed and may not have been processed on all replica -shards. Currently, the discrepancies are not repaired on primary -promotion but instead would be repaired if replica shards are relocated -(e.g., from hot to cold nodes); this does mean that the length of time -which replicas can be out of sync with the primary shard is -unbounded. Sequence numbers {GIT}10708[#10708] will provide a mechanism -for syncing the remaining replicas with the newly-promoted primary +[float] +=== Documents indexed during a network partition cannot be uniquely identified (STATUS: DONE, v7.0.0) + +When a primary has been partitioned away from the cluster there is a short +period of time until it detects this. During that time it will continue +indexing writes locally, thereby updating document versions. When it tries +to replicate the operation, however, it will discover that it is partitioned +away. It won't acknowledge the write and will wait until the partition is +resolved to negotiate with the master on how to proceed. The master will +decide to either fail any replicas which failed to index the operations on +the primary or tell the primary that it has to step down because a new primary +has been chosen in the meantime. Since the old primary has already written +documents, clients may already have read from the old primary before it shuts +itself down. The `_version` field of these reads may not uniquely identify the +document's version if the new primary has already accepted writes for the same +document (see {GIT}19269[#19269]). + +The Sequence numbers infrastructure {GIT}10708[#10708] has introduced more +precise ways for tracking primary changes. This new infrastructure therefore +provides a way for uniquely identifying documents using their primary term +and sequence number fields, even in the presence of network partitions, and +has been used to replace the `_version` field in operations that require +uniquely identifying the document, such as optimistic concurrency control. + +[float] +=== Replicas can fall out of sync when a primary shard fails (STATUS: DONE, v7.0.0) + +When a primary shard fails, a replica shard will be promoted to be the primary +shard. If there is more than one replica shard, it is possible for the +remaining replicas to be out of sync with the new primary shard. This is caused +by operations that were in-flight when the primary shard failed and may not +have been processed on all replica shards. These discrepancies are not +repaired on primary promotion but instead delayed until replica shards are +relocated (e.g., from hot to cold nodes); this means that the length of time +in which replicas can be out of sync with the primary shard is unbounded. + +Sequence numbers {GIT}10708[#10708] provide a mechanism for identifying +the discrepancies between shard copies at the document level, which allows +to efficiently sync up the remaining replicas with the newly-promoted primary shard. -== Completed - [float] === Repeated network partitions can cause cluster state updates to be lost (STATUS: DONE, v7.0.0)