Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow engine to recover from translog upto a seqno #33032

Merged
merged 2 commits into from
Aug 22, 2018

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Aug 21, 2018

This change allows an engine to recover from its local translog up to
the given seqno. The extended API can be used in these use cases:

  1. When a replica starts following a new primary, it resets its index to
    the safe commit, then replays its local translog up to the current
    global checkpoint (see Reset replica engine before primary-replica resync #32867).

  2. When a replica starts a peer-recovery, it can initialize the
    start_sequence_number to the persisted global checkpoint instead of the
    local checkpoint of the safe commit. A replica will then replay its
    local translog up to that global checkpoint before accepting remote
    translog from the primary. This change will increase the chance of
    operation-based recovery. I will make this in a follow-up.

Relates #32867

/cc @bleskes

This change allows an engine to recover from its local translog up to
the given seqno. The extended API can be used in these use cases:

1. When a replica starts following a new primary, it resets its index to
the safe commit, then replays its local  translog up to the current
global checkpoint (see elastic#32867).

2. When a replica starts a peer-recovery, it can initialize the
start_sequence_number to the persisted global checkpoint instead of the
local checkpoint of the safe commit. A replica  will then replay its
local translog up to that global checkpoint before accepting remote
translog from the primary. This change will increase the chance of
operation-based recovery. I will make this in a follow-up.

Relates elastic#32867
@dnhatn dnhatn added >enhancement v7.0.0 :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. v6.5.0 labels Aug 21, 2018
@dnhatn dnhatn requested review from s1monw and ywelsch August 21, 2018 17:37
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left a few comments

@@ -1623,10 +1623,10 @@ public long getLastWriteNanos() {
public abstract int fillSeqNoGaps(long primaryTerm) throws IOException;

/**
* Performs recovery from the transaction log.
* Performs recovery from the transaction log up to {@code recoverUpToSeqNo}.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add that it's an inclusive bound? (i.e. up to XYZ included)

}
}

public Snapshot newSnapshotFromGen(long minGeneration) throws IOException {
public Snapshot newSnapshotFromGen(TranslogGeneration fromGeneration, long upToSeqNo) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you change this to take a TranslogGeneration with the uuid instead of just the long minGeneration? It's not using that uuid anywhere here AFAICS.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method might be interpreted as a range of translog generations or a range of sequence numbers if the parameter is a tuple of Longs. I changed to TranslogGeneration to avoid this issue. I will revert this change if you don't like it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good though. less likely to misuse.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, makes sense.

return new Snapshot() {
int skippedOps = 0;
@Override
public int totalOperations() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also override overriddenOperations and delegate to snapshot.overriddenOperations?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch!

if (upToSeqNo == Long.MAX_VALUE) {
return snapshot;
}
return new Snapshot() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have a proper (top-level) class for this, supporting both min and max. min would be useful for newSnapshotFromMinSeqNo (see e.g. PrimaryReplicaResyncer, which still has to filter based on min = startingSeqNo, all of which could be accomplished through the Snapshot), and max would be useful for this one here (where it might also have a min).

We might even make the interface of this newSnapshot method purely sequence-number-based, where you can specify the range of operations to recover instead of the translog generation. That last part is not something I would change right away, but maybe something to look into later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added filter(Predicate<Operation>) method to the Snapshot for this purpose. However, I feel it's too broad; then I go with the filter class that you suggested.

@dnhatn dnhatn requested a review from ywelsch August 22, 2018 00:31
@dnhatn
Copy link
Member Author

dnhatn commented Aug 22, 2018

@ywelsch I've addressed your comments. Would you please have another look? Thank you!

Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dnhatn
Copy link
Member Author

dnhatn commented Aug 22, 2018

Thanks @ywelsch and @s1monw.

@dnhatn dnhatn merged commit 262d3c0 into elastic:master Aug 22, 2018
@dnhatn dnhatn deleted the recover-upto-seqno branch August 22, 2018 11:57
dnhatn added a commit that referenced this pull request Aug 26, 2018
This change allows an engine to recover from its local translog up to
the given seqno. The extended API can be used in these use cases:

When a replica starts following a new primary, it resets its index to
the safe commit, then replays its local translog up to the current
global checkpoint (see #32867).

When a replica starts a peer-recovery, it can initialize the
start_sequence_number to the persisted global checkpoint instead of the
local checkpoint of the safe commit. A replica will then replay its
local translog up to that global checkpoint before accepting remote
translog from the primary. This change will increase the chance of
operation-based recovery. I will make this in a follow-up.

Relates #32867
dnhatn added a commit that referenced this pull request Aug 26, 2018
* 6.x:
  Allow engine to recover from translog upto a seqno (#33032)
  TEST: Skip assertSeqNos for closed shards (#33130)
  TEST: resync operation on replica should acquire shard permit (#33103)
  Add proxy support to RemoteClusterConnection (#33062)
  Build: Line up IDE detection logic
  Security index expands to a single replica (#33131)
  Suppress more tests
  HLRC: request/response homogeneity and JavaDoc improvements (#33133)
  [Rollup] Move toAggCap() methods out of rollup config objects (#32583)
  Muted all these tests due to #33128
  Fix race condition in scheduler engine test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. >enhancement v6.5.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants