Use index for peer recovery instead of translog #45136

DaveCTurner · 2019-08-02T10:28:57Z

Today we recover a replica by copying operations from the primary's translog.
However we also retain some historical operations in the index itself, as long
as soft-deletes are enabled. This commit adjusts peer recovery to use the
operations in the index for recovery rather than those in the translog, and
ensures that the replication group retains enough history for use in peer
recovery by means of retention leases.

Reverts #38904 and #42211
Relates #41536

This creates a peer-recovery retention lease for every shard during recovery, ensuring that the replication group retains history for future peer recoveries. It also ensures that leases for active shard copies do not expire, and leases for inactive shard copies expire immediately if the shard is fully-allocated. Relates elastic#41536

This commit adjusts the behaviour of the retention lease sync to first renew any peer-recovery retention leases where either: - the corresponding shard's global checkpoint has advanced, or - the lease is older than half of its expiry time Relates elastic#41536

If the primary performs a file-based recovery to a node that has (or recently had) a copy of the shard then it is possible that the persisted global checkpoint of the new copy is behind that of the old copy since file-based recoveries are somewhat destructive operations. Today we leave that node's PRRL in place during the recovery with the expectation that it can be used by the new copy. However this isn't the case if the new copy needs more history to be retained, because retention leases may only advance and never retreat. This commit addresses this by removing any existing PRRL during a file-based recovery: since we are performing a file-based recovery we have already determined that there isn't enough history for an ops-based recovery, so there is little point in keeping the old lease in place. Caught by [a failure of `RecoveryWhileUnderLoadIT.testRecoverWhileRelocating`](https://scans.gradle.com/s/wxccfrtfgjj3g/console-log?task=:server:integTest#L14) Relates elastic#41536

This commit updates the version in which PRRLs are expected to exist to 7.4.0.

Today we perform `TransportReplicationAction` derivatives during recovery, and these actions call their response handlers on the transport thread. This change moves the continued execution of the recovery back onto the generic threadpool.

Today when renewing PRRLs we assert that any invalid "backwards" renewals must be because we are recovering the shard. In fact it's also possible to have `checkpointState.globalCheckpoint == SequenceNumbers.UNASSIGNED_SEQ_NO` on a tracked shard copy if the primary was just promoted and hasn't received checkpoints from all of its peers too. This commit weakens the assertion to match. Caught by a [failure of the full cluster restart tests](https://scans.gradle.com/s/5lllzgqtuegty/console-log#L8605) Relates elastic#41536

In elastic#44000 we introduced some calls to `assertNotTransportThread` that are executed whether assertions are enabled or not. Although they have no effect if assertions are disabled, we should have done it like this instead.

Today peer recovery retention leases (PRRLs) are created when starting a replication group from scratch and during peer recovery. However, if the replication group was migrated from nodes running a version which does not create PRRLs (e.g. 7.3 and earlier) then it's possible that the primary was relocated or promoted without first establishing all the expected leases. It's not possible to establish these leases before or during primary activation, so we must create them as soon as possible afterwards. This gives weaker guarantees about history retention, since there's a possibility that history will be discarded before it can be used. In practice such situations are expected to occur only rarely. This commit adds the machinery to create missing leases after primary activation, and strengthens the assertions about the existence of such leases in order to ensure that once all the leases do exist we never again enter a state where there's a missing lease. Relates elastic#41536

The cluster in the full-cluster restart test only has 2 nodes, so we cannot fully allocate an index with 2 replicas.

Today PRRLs are not supported on closed indices or indices where soft deletes are disabled, but (confusingly) nor are they actively forbidden. This commit avoids creating them unnecessarily in unsupported situations.

Now that elastic#45136 means we perform recoveries from the index rather than the translog (if soft deletes are enabled) there is no need to retain extra translog for performing peer recoveries. This commit reduces the default translog retention to zero so that it can be discarded more quickly.

Since #45136, we use soft-deletes instead of translog in peer recovery. There's no need to retain extra translog to increase a chance of operation-based recoveries. This commit ignores the translog retention policy if soft-deletes is enabled so we can discard translog more quickly. Co-authored-by: David Turner <[email protected]> Relates #45136

Since elastic#45136, we use soft-deletes instead of translog in peer recovery. There's no need to retain extra translog to increase a chance of operation-based recoveries. This commit ignores the translog retention policy if soft-deletes is enabled so we can discard translog more quickly. Co-authored-by: David Turner <[email protected]> Relates elastic#45136

Since #45136, we use soft-deletes instead of translog in peer recovery. There's no need to retain extra translog to increase a chance of operation-based recoveries. This commit ignores the translog retention policy if soft-deletes is enabled so we can discard translog more quickly. Backport of #45473 Relates #45136

Today we do not use retention leases in peer recovery for closed indices because we can't sync retention leases on closed indices. This change allows that ability and adjusts peer recovery to use retention leases for all indices with soft-deletes enabled. Relates #45136 Co-authored-by: David Turner <[email protected]>

Since 7.4, we switch from translog to Lucene as the source of history for peer recoveries. However, we reduce the likelihood of operation-based recoveries when performing a full cluster restart from pre-7.4 because existing copies do not have PPRL. To remedy this issue, we fallback using translog in peer recoveries if the recovering replica does not have a peer recovery retention lease, and the replication group hasn't fully migrated to PRRL. Relates #45136

Today we do not use retention leases in peer recovery for closed indices because we can't sync retention leases on closed indices. This change allows that ability and adjusts peer recovery to use retention leases for all indices with soft-deletes enabled. Relates #45136 Co-authored-by: David Turner <[email protected]>

Since 7.4, we switch from translog to Lucene as the source of history for peer recoveries. However, we reduce the likelihood of operation-based recoveries when performing a full cluster restart from pre-7.4 because existing copies do not have PPRL. To remedy this issue, we fallback using translog in peer recoveries if the recovering replica does not have a peer recovery retention lease, and the replication group hasn't fully migrated to PRRL. Relates #45136

Since 7.4, we switch from translog to Lucene as the source of history for peer recoveries. However, we reduce the likelihood of operation-based recoveries when performing a full cluster restart from pre-7.4 because existing copies do not have PPRL. To remedy this issue, we fallback using translog in peer recoveries if the recovering replica does not have a peer recovery retention lease, and the replication group hasn't fully migrated to PRRL. Relates elastic#45136

Since 7.4, we switch from translog to Lucene as the source of history for peer recoveries. However, we reduce the likelihood of operation-based recoveries when performing a full cluster restart from pre-7.4 because existing copies do not have PPRL. To remedy this issue, we fallback using translog in peer recoveries if the recovering replica does not have a peer recovery retention lease, and the replication group hasn't fully migrated to PRRL. Relates #45136

…50351) Today, the replica allocator uses peer recovery retention leases to select the best-matched copies when allocating replicas of indices with soft-deletes. We can employ this mechanism for indices without soft-deletes because the retaining sequence number of a PRRL is the persisted global checkpoint (plus one) of that copy. If the primary and replica have the same retaining sequence number, then we should be able to perform a noop recovery. The reason is that we must be retaining translog up to the local checkpoint of the safe commit, which is at most the global checkpoint of either copy). The only limitation is that we might not cancel ongoing file-based recoveries with PRRLs for noop recoveries. We can't make the translog retention policy comply with PRRLs. We also have this problem with soft-deletes if a PRRL is about to expire. Relates #45136 Relates #46959

Since 7.4, we switch from translog to Lucene as the source of history for peer recoveries. However, we reduce the likelihood of operation-based recoveries when performing a full cluster restart from pre-7.4 because existing copies do not have PPRL. To remedy this issue, we fallback using translog in peer recoveries if the recovering replica does not have a peer recovery retention lease, and the replication group hasn't fully migrated to PRRL. Relates elastic#45136

…lastic#50351) Today, the replica allocator uses peer recovery retention leases to select the best-matched copies when allocating replicas of indices with soft-deletes. We can employ this mechanism for indices without soft-deletes because the retaining sequence number of a PRRL is the persisted global checkpoint (plus one) of that copy. If the primary and replica have the same retaining sequence number, then we should be able to perform a noop recovery. The reason is that we must be retaining translog up to the local checkpoint of the safe commit, which is at most the global checkpoint of either copy). The only limitation is that we might not cancel ongoing file-based recoveries with PRRLs for noop recoveries. We can't make the translog retention policy comply with PRRLs. We also have this problem with soft-deletes if a PRRL is about to expire. Relates elastic#45136 Relates elastic#46959

DaveCTurner and others added 30 commits June 19, 2019 17:39

Merge branch 'master' into peer-recovery-retention-leases

dfa22bc

Treat UNASSIGNED_SEQUENCE_NUMBER as NO_OPS_PERFORMED

f68fac4

Merge branch 'master' into peer-recovery-retention-leases

cb39840

Add missing GCP update (elastic#43632)

f5fdb75

Merge branch 'master' into peer-recovery-retention-leases

00145cd

Merge branch 'master' into peer-recovery-retention-leases

cb6b0a9

Remove file committed in error

b328478

Less sync

7f7f84b

Relax condition, we may have renewed some other leases too

6bac16a

Better test fix

9941eb6

Checkstyle

f3fbb33

Merge branch 'master' into peer-recovery-retention-leases

b1be151

Merge branch 'master' into peer-recovery-retention-leases

291ff8d

Update BWC version for PRRLs (elastic#43958)

ac2da33

This commit updates the version in which PRRLs are expected to exist to 7.4.0.

Merge branch 'master' into peer-recovery-retention-leases

76ff6e8

Merge branch 'master' into peer-recovery-retention-leases

9523445

Reduce number of replicas in cluster restart test

11e9880

The cluster in the full-cluster restart test only has 2 nodes, so we cannot fully allocate an index with 2 replicas.

Only create missing PRRLs when appropriate

d7f7ebc

Today PRRLs are not supported on closed indices or indices where soft deletes are disabled, but (confusingly) nor are they actively forbidden. This commit avoids creating them unnecessarily in unsupported situations.

Fix comment

ba7c4be

Merge branch 'master' into peer-recovery-retention-leases

e12bde6

Merge branch 'master' into peer-recovery-retention-leases

b8bcc0b

Merge branch 'master' into peer-recovery-retention-leases

40ea029

Merge branch 'master' into peer-recovery-retention-leases

69c94f4

dnhatn mentioned this pull request Aug 12, 2019

Ignore translog retention policy if soft-deletes enabled #45473

Merged

dnhatn mentioned this pull request Aug 22, 2019

Ignore translog retention policy if soft-deletes enabled #45868

Merged

dnhatn mentioned this pull request Sep 3, 2019

TrimUnreferencedReaders: move sync translog operation outside writeLock to improve performance #46203

Merged

dnhatn mentioned this pull request Oct 23, 2019

Use retention lease in peer recovery of closed indices #48430

Merged

dnhatn mentioned this pull request Nov 21, 2019

Migrate peer recovery from translog to retention lease #49448

Merged

dnhatn mentioned this pull request Dec 16, 2019

Migrate peer recovery from translog to retention lease #50211

Merged

dnhatn mentioned this pull request Dec 19, 2019

Use peer recovery retention leases for indices without soft-deletes #50351

Merged

This was referenced Feb 3, 2020

[meta] 7.6 release elastic/elasticsearch-net#4340

Closed

[meta] 7.6 release elastic/elasticsearch-net#4341

Closed

mfussenegger mentioned this pull request Mar 24, 2020

ES Backports crate/crate#9796

Closed

37 tasks

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use index for peer recovery instead of translog #45136

Use index for peer recovery instead of translog #45136

DaveCTurner commented Aug 2, 2019

Use index for peer recovery instead of translog #45136

Use index for peer recovery instead of translog #45136

Conversation

DaveCTurner commented Aug 2, 2019