Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update resiliency status page for 7.0 #41522

Merged
merged 3 commits into from
Apr 25, 2019
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 41 additions & 30 deletions docs/resiliency/index.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ been made and current in-progress work. We’ve also listed some historical
improvements throughout this page to provide the full context.

If you’re interested in more on how we approach ensuring resiliency in
Elasticsearch, you may be interested in Igor Motov’s recent talk
Elasticsearch, you may be interested in Igor Motov’s talk
http://www.elastic.co/videos/improving-elasticsearch-resiliency[Improving Elasticsearch Resiliency].

You may also be interested in our blog post
Expand Down Expand Up @@ -102,20 +102,6 @@ space. The following issues have been identified:

Other safeguards are tracked in the meta-issue {GIT}11511[#11511].

[float]
=== The _version field may not uniquely identify document content during a network partition (STATUS: ONGOING)

When a primary has been partitioned away from the cluster there is a short period of time until it detects this. During that time it will continue
indexing writes locally, thereby updating document versions. When it tries to replicate the operation, however, it will discover that it is
partitioned away. It won't acknowledge the write and will wait until the partition is resolved to negotiate with the master on how to proceed.
The master will decide to either fail any replicas which failed to index the operations on the primary or tell the primary that it has to
step down because a new primary has been chosen in the meantime. Since the old primary has already written documents, clients may already have read from
the old primary before it shuts itself down. The version numbers of these reads may not be unique if the new primary has already accepted
writes for the same document (see {GIT}19269[#19269]).

We are currently implementing Sequence numbers {GIT}10708[#10708] which better track primary changes. Sequence numbers thus provide a basis
for uniquely identifying writes even in the presence of network partitions and will replace `_version` in operations that require this.

[float]
=== Relocating shards omitted by reporting infrastructure (STATUS: ONGOING)

Expand All @@ -136,24 +122,49 @@ We have ported the known scenarios in the Jepsen blogs that check loss of acknow
The new tests are run continuously in our testing farm and are passing. We are also working on running Jepsen independently to verify
that no failures are found.

[float]
=== Replicas can fall out of sync when a primary shard fails (STATUS: ONGOING)
== Completed

When a primary shard fails, a replica shard will be promoted to be the
primary shard. If there is more than one replica shard, it is possible
for the remaining replicas to be out of sync with the new primary
shard. This is caused by operations that were in-flight when the primary
shard failed and may not have been processed on all replica
shards. Currently, the discrepancies are not repaired on primary
promotion but instead would be repaired if replica shards are relocated
(e.g., from hot to cold nodes); this does mean that the length of time
which replicas can be out of sync with the primary shard is
unbounded. Sequence numbers {GIT}10708[#10708] will provide a mechanism
for syncing the remaining replicas with the newly-promoted primary
[float]
=== Documents indexed during a network partition cannot be uniquely identified (STATUS: DONE, v7.0.0)

When a primary has been partitioned away from the cluster there is a short
period of time until it detects this. During that time it will continue
indexing writes locally, thereby updating document versions. When it tries
to replicate the operation, however, it will discover that it is partitioned
away. It won't acknowledge the write and will wait until the partition is
resolved to negotiate with the master on how to proceed. The master will
decide to either fail any replicas which failed to index the operations on
the primary or tell the primary that it has to step down because a new primary
has been chosen in the meantime. Since the old primary has already written
documents, clients may already have read from the old primary before it shuts
itself down. The `_version` field of these reads may not uniquely identify the
document's version if the new primary has already accepted writes for the same
document (see {GIT}19269[#19269]).

The Sequence numbers infrastructure {GIT}10708[#10708] has introduced more
precise ways for tracking primary changes. This new infrastructure therefore
provides a way for uniquely identifying documents using their primary term
and sequence number fields, even in the presence of network partitions, and
has been used to replace the `_version` field in operations that require
uniquely identifying the document, such as optimistic concurrency control.

[float]
=== Replicas can fall out of sync when a primary shard fails (STATUS: DONE, v7.0.0)

When a primary shard fails, a replica shard will be promoted to be the primary
shard. If there is more than one replica shard, it is possible for the
remaining replicas to be out of sync with the new primary shard. This is caused
by operations that were in-flight when the primary shard failed and may not
have been processed on all replica shards. These discrepancies are not
repaired on primary promotion but instead delayed until replica shards are
relocated (e.g., from hot to cold nodes); this means that the length of time
in which replicas can be out of sync with the primary shard is unbounded.

Sequence numbers {GIT}10708[#10708] provide a mechanism for identifying
the discrepancies between shard copies at the document level, which allows
to efficiently sync up the remaining replicas with the newly-promoted primary
shard.

== Completed

[float]
=== Repeated network partitions can cause cluster state updates to be lost (STATUS: DONE, v7.0.0)

Expand Down