-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow engine to recover from translog upto a seqno #33032
Conversation
This change allows an engine to recover from its local translog up to the given seqno. The extended API can be used in these use cases: 1. When a replica starts following a new primary, it resets its index to the safe commit, then replays its local translog up to the current global checkpoint (see elastic#32867). 2. When a replica starts a peer-recovery, it can initialize the start_sequence_number to the persisted global checkpoint instead of the local checkpoint of the safe commit. A replica will then replay its local translog up to that global checkpoint before accepting remote translog from the primary. This change will increase the chance of operation-based recovery. I will make this in a follow-up. Relates elastic#32867
Pinging @elastic/es-distributed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left a few comments
@@ -1623,10 +1623,10 @@ public long getLastWriteNanos() { | |||
public abstract int fillSeqNoGaps(long primaryTerm) throws IOException; | |||
|
|||
/** | |||
* Performs recovery from the transaction log. | |||
* Performs recovery from the transaction log up to {@code recoverUpToSeqNo}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add that it's an inclusive bound? (i.e. up to XYZ included)
} | ||
} | ||
|
||
public Snapshot newSnapshotFromGen(long minGeneration) throws IOException { | ||
public Snapshot newSnapshotFromGen(TranslogGeneration fromGeneration, long upToSeqNo) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did you change this to take a TranslogGeneration
with the uuid instead of just the long minGeneration
? It's not using that uuid anywhere here AFAICS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method might be interpreted as a range of translog generations or a range of sequence numbers if the parameter is a tuple of Longs. I changed to TranslogGeneration to avoid this issue. I will revert this change if you don't like it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's good though. less likely to misuse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, makes sense.
return new Snapshot() { | ||
int skippedOps = 0; | ||
@Override | ||
public int totalOperations() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also override overriddenOperations
and delegate to snapshot.overriddenOperations
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch!
if (upToSeqNo == Long.MAX_VALUE) { | ||
return snapshot; | ||
} | ||
return new Snapshot() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should have a proper (top-level) class for this, supporting both min and max. min would be useful for newSnapshotFromMinSeqNo
(see e.g. PrimaryReplicaResyncer, which still has to filter based on min = startingSeqNo, all of which could be accomplished through the Snapshot), and max would be useful for this one here (where it might also have a min
).
We might even make the interface of this newSnapshot
method purely sequence-number-based, where you can specify the range of operations to recover instead of the translog generation. That last part is not something I would change right away, but maybe something to look into later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added filter(Predicate<Operation>)
method to the Snapshot for this purpose. However, I feel it's too broad; then I go with the filter class that you suggested.
@ywelsch I've addressed your comments. Would you please have another look? Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This change allows an engine to recover from its local translog up to the given seqno. The extended API can be used in these use cases: When a replica starts following a new primary, it resets its index to the safe commit, then replays its local translog up to the current global checkpoint (see #32867). When a replica starts a peer-recovery, it can initialize the start_sequence_number to the persisted global checkpoint instead of the local checkpoint of the safe commit. A replica will then replay its local translog up to that global checkpoint before accepting remote translog from the primary. This change will increase the chance of operation-based recovery. I will make this in a follow-up. Relates #32867
* 6.x: Allow engine to recover from translog upto a seqno (#33032) TEST: Skip assertSeqNos for closed shards (#33130) TEST: resync operation on replica should acquire shard permit (#33103) Add proxy support to RemoteClusterConnection (#33062) Build: Line up IDE detection logic Security index expands to a single replica (#33131) Suppress more tests HLRC: request/response homogeneity and JavaDoc improvements (#33133) [Rollup] Move toAggCap() methods out of rollup config objects (#32583) Muted all these tests due to #33128 Fix race condition in scheduler engine test
This change allows an engine to recover from its local translog up to
the given seqno. The extended API can be used in these use cases:
When a replica starts following a new primary, it resets its index to
the safe commit, then replays its local translog up to the current
global checkpoint (see Reset replica engine before primary-replica resync #32867).
When a replica starts a peer-recovery, it can initialize the
start_sequence_number to the persisted global checkpoint instead of the
local checkpoint of the safe commit. A replica will then replay its
local translog up to that global checkpoint before accepting remote
translog from the primary. This change will increase the chance of
operation-based recovery. I will make this in a follow-up.
Relates #32867
/cc @bleskes