Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Primary send safe commit in file-based recovery #28038

Merged
merged 8 commits into from
Jan 11, 2018

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Dec 31, 2017

Today a primary shard transfers the most recent commit point to a replica shard in a file-based recovery. However, the most recent commit may not be a "safe" commit; this causes a replica shard not having a safe commit point until it can retain a safe commit by itself.

This commits collapses the snapshot deletion policy into the combined deletion policy and modifies the peer recovery source to send a safe commit.

Relates #10708

Today a primary shard transfers the most recent commit to a replica
shard in a file-based recovery. This causes replica shards not having a
safe commit point until it can retain a safe commit by itself. This
commits collapses the snapshot deletion policy into the combined
deletion policy and modifies primary shards to send a safe commit.
@dnhatn dnhatn added :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. >enhancement review v6.2.0 v7.0.0 labels Dec 31, 2017
@dnhatn dnhatn changed the title Primary sends a safe commit in file-based recovery Primary send safe commit in file-based recovery Dec 31, 2017
# Conflicts:
#	test/framework/src/main/java/org/elasticsearch/index/shard/IndexShardTestCase.java
Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it. I made a quick pass with initial comments.

assert lastCommit != null : "Last commit is not initialized yet";
final IndexCommit snapshotting = acquiringSafeCommit ? safeCommit : lastCommit;
snapshottedCommits.addTo(snapshotting, 1); // increase refCount
return new Engine.IndexCommitRef(snapshotting, () -> releaseCommit(snapshotting));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we leave this to the engine and make the releaseCommit method public?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can also return a commit wrapper that disable the delete method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

assert refCount >= 0 : "Number of snapshots can not be negative [" + refCount + "]";
if (refCount == 0) {
snapshottedCommits.remove(releasingCommit);
updateTranslogDeletionPolicy();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does the translog care about the snapshotting situation? it only cares about the safe commit, no?

Copy link
Member Author

@dnhatn dnhatn Jan 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assume that we have two commits c1 and c2 with c1 is the safe commit and c2 is the last commit. Clients acquire c1, then we have to keep translog of c1 until they release the commit. During that time, we have a new commit (or global checkpoint advanced), we can release c1 and its translog but have to keep them as they are being snapshotted. When clients release the commit c1, we should also release its translog rather than wait until the next onCommit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not release translog here. We should either keep or release both the commit and translog at the same time. I will update this.

* @param flushFirst indicates whether the engine should flush before returning the snapshot
*/
public abstract IndexCommitRef acquireIndexCommit(boolean flushFirst) throws EngineException;
public abstract IndexCommitRef acquireIndexCommit(boolean safeCommit, boolean flushFirst) throws EngineException;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should look in future to remove one of these booleans - either you want to "get everything" or you want a safe commit. I don't think there's a point in "flush but give me a safe commit"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Are you ok if we replace this by having two methods: acquireLastIndexCommit(flushFirst) and acquireSafeIndexCommit(no option)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree. Are you ok if we replace this by having two methods: acquireLastIndexCommit(flushFirst) and acquireSafeIndexCommit(no option)?

+1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double checking - the new methods will be a follow up?

@dnhatn
Copy link
Member Author

dnhatn commented Jan 10, 2018

@bleskes, I've updated the deletion policy to return a non-deletable commit and wrap it into a commit-ref in the engine. I am not happy with the casting line in the releaseCommit method [final IndexCommit releasingCommit = ((SnapshotIndexCommit) snapshotCommit).delegate;]. I will reach out to discuss this with you.

Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good. I left some more minor feedback

* Releases an index commit that acquired by {@link #acquireIndexCommit(boolean)}.
*/
synchronized void releaseCommit(final IndexCommit releasingCommit) {
assert snapshottedCommits.containsKey(releasingCommit) : "Release non-snapshotted commit;" +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does that work? releasingCommit is a SnapshotIndexCommit ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The snapshotting commits are stored as keys in a HashMap. Both SnapshotIndexCommit and regular index commit inherit equals and hashCode from the root IndexCommit, thus they are interchangeable. This can be problematic if a regular index commit overrides equals or hashCode.

I see some options to avoid this.

  1. Exposes SnapshotIndexCommit to package level; makes acquireCommit and releaseCommit with SnapshotIndexCommit type.
  2. Delegate hashCode and equals of SnapshotIndexCommit to the original index commit.

WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I think it's risky as the underlying IndexCommit may have a different implementation (it's not final). I see a 3rd option - we could use an identity map and sure people only release index commits they got from us. Until we need the ability to work all kind of crazyness like wrapped IndexCommits, I prefer to keep things strict.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lucene's SnapshotDeletionPolicy identifies the IndexCommit's based on their generation (long field).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed and agreed to keep the current implementation.

logger.trace("pulling snapshot");
return new IndexCommitRef(snapshotDeletionPolicy);
} catch (IOException e) {
throw new SnapshotFailedEngineException(shardId, e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means a potential change to the exception type. Can you double check it's OK?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be ok. The method snapshot of Lucene's SnapshotDeletionPolicy throws IOException but we don't. Acquiring a commit is just increasing refCount of a commit.

final IndexCommit ref2 = indexPolicy.acquireIndexCommit(false);
assertThat(ref2, equalTo(c2));
expectThrows(UnsupportedOperationException.class, ref2::delete);
assertThat(translogPolicy.getMinTranslogGenerationForRecovery(), lessThanOrEqualTo(100L));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be exactly translogGen1?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol. I fixed

assertThat(ref3, equalTo(c2));
assertThat(translogPolicy.getMinTranslogGenerationForRecovery(), equalTo(translogGen1));
indexPolicy.releaseCommit(ref1); // release acquired commit releases translog and commit
indexPolicy.onCommit(Arrays.asList(c1, c2)); // Flush new commit deletes c1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also release c2 and see that it is not deleted? I would also appreciate some randomness here - i.e., a series of commits, choose snapshot at random times. Release at random order and check that all is OK.

@bleskes
Copy link
Contributor

bleskes commented Jan 10, 2018 via email

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left two suggestions

@@ -42,12 +46,16 @@
private final TranslogDeletionPolicy translogDeletionPolicy;
private final EngineConfig.OpenMode openMode;
private final LongSupplier globalCheckpointSupplier;
private final ObjectIntHashMap<IndexCommit> snapshottedCommits; // Number of snapshots held against each commit point.
private IndexCommit safeCommit; // the most recent safe commit point - its max_seqno at most the persisted global checkpoint.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The safe commit point is a constantly moving target (as the global checkpoint keeps going up and new commits are being added). I wonder if it's nicer to calculate the safe commit point when it's accessed in acquireIndexCommit, based on the then current globalcheckpoint and the current list of commits (This will require storing the last seen indexCommits, but that would be equivalent to what's being done in Lucene's SnapshotDeletionPolicy).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ywelsch Nhat has a follow up to trim unneeded commits as soon as the global checkpoint advances enough. This will have a side effect you mention (it will update things) with some added value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but that involves some extra machinery to update safeCommit at the right points in time (e.g. when gcp advances)? My suggestion here makes that unnecessary?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think the added value of the approach in the follow-up is that the clean-up logic is also possibly more eagerly invoked (when gcp advances).

* Releases an index commit that acquired by {@link #acquireIndexCommit(boolean)}.
*/
synchronized void releaseCommit(final IndexCommit releasingCommit) {
assert snapshottedCommits.containsKey(releasingCommit) : "Release non-snapshotted commit;" +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lucene's SnapshotDeletionPolicy identifies the IndexCommit's based on their generation (long field).

@dnhatn
Copy link
Member Author

dnhatn commented Jan 10, 2018

@bleskes

  1. I've added a random test and updated the index policy not to retain translog of the snapshotting commits.
  2. I realized that a delegating equals does not fully conform the equals contract. I decided to cast a releasing commit to the inner class. @ywelsch pointed out that Lucene's snapshot policy uses a commit-generation as a key (we are using generation and directory). I am fine with the suggestion, WDYT?

Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @dnhatn

final int keptPosition = indexOfKeptCommits(commits, globalCheckpointSupplier.getAsLong());
lastCommit = commits.get(commits.size() - 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we assert that the translog gen is in this commit is lower than all the ones in higher commits?

* @param flushFirst indicates whether the engine should flush before returning the snapshot
*/
public abstract IndexCommitRef acquireIndexCommit(boolean flushFirst) throws EngineException;
public abstract IndexCommitRef acquireIndexCommit(boolean safeCommit, boolean flushFirst) throws EngineException;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double checking - the new methods will be a follow up?

@dnhatn
Copy link
Member Author

dnhatn commented Jan 11, 2018

Thanks @bleskes and @ywelsch for reviewing.

@dnhatn dnhatn merged commit 626c3d1 into elastic:master Jan 11, 2018
@dnhatn dnhatn deleted the recovery/primary-send-safe-commit branch January 11, 2018 15:39
dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Jan 11, 2018
Currently we keep a 5.x index commit as a safe commit until we have a
6.x safe commit. During that time, if peer-recovery happens, a primary
will send a 5.x commit in file-based sync and the recovery will even
fail as the snapshotted commit does not have sequence number tags.

This commit updates the combined deletion policy to delete legacy
commits if there are 6.x commits.

Relates elastic#27606
Relates elastic#28038
jasontedor added a commit that referenced this pull request Jan 11, 2018
* master: (43 commits)
  Rename core module to server (#28180)
  upgraded jna from 4.4.0-1 to 4.5.1 (#28183)
  [TEST] Do not call RandomizedTest.scaledRandomIntBetween from multiple threads
  Primary send safe commit in file-based recovery (#28038)
  [Docs] Correct response json in rank-eval.asciidoc
  Add scroll parameter to _reindex API (#28041)
  Include all sentences smaller than fragment_size in the unified highlighter (#28132)
  Modifies the JavaAPI docs related to AggregationBuilder
  [Docs] Improvements in script-fields.asciidoc (#28174)
  [Docs] Remove Kerberos/SPNEGO Shield plugin (#28019)
  Ignore null value for range field (#27845) (#28116)
  Fix environment variable substitutions in list setting (#28106)
  docs: Replaces indexed script java api docs with stored script api docs
  test: ensure we endup with a single segment
  Make sure that we don't detect files as maven coordinate when installing a plugin (#28163)
  [Tests] temporary disable meta plugin rest tests #28163
  meta-plugin should install bin and config at the top level (#28162)
  Painless: Add public member read/write access test. (#28156)
  Docs: Clarify password protection support with keystore (#28157)
  [Docs] fix plugin properties inclusion for plugins authors
  ...
dnhatn added a commit that referenced this pull request Jan 11, 2018
Currently we keep a 5.x index commit as a safe commit until we have a
6.x safe commit. During that time, if peer-recovery happens, a primary
will send a 5.x commit in file-based sync and the recovery will even
fail as the snapshotted commit does not have sequence number tags.

This commit updates the combined deletion policy to delete legacy
commits if there are 6.x commits.

Relates #27606
Relates #28038
dnhatn added a commit that referenced this pull request Jan 12, 2018
Currently we keep a 5.x index commit as a safe commit until we have a
6.x safe commit. During that time, if peer-recovery happens, a primary
will send a 5.x commit in file-based sync and the recovery will even
fail as the snapshotted commit does not have sequence number tags.

This commit updates the combined deletion policy to delete legacy
commits if there are 6.x commits.

Relates #27606
Relates #28038
dnhatn added a commit that referenced this pull request Jan 12, 2018
If a 6.x node with a 5.x index is promoted to be a primary, it will flush a new
index commit to make sure translog operations without seqno will never be
replayed (see IndexShard#updateShardState). However the global checkpoint is
still UNASSIGNED and the max_seqno of both commits are NO_OPS_PERFORMED. If the
combined deletion policy considers the first commit as a safe commit, we will
send the first commit without replaying translog between these commits to the
replica in a peer-recovery. This causes the replica missing those operations.
To prevent this, we should not keep more than one commit whose max_seqno is
NO_OPS_PERFORMED. Once we can retain a safe commit, a NO_OPS_PERFORMED commit
will be deleted just as other commits.

Relates #28038
dnhatn added a commit that referenced this pull request Jan 12, 2018
Today a primary shard transfers the most recent commit point to a
replica shard in a file-based recovery. However, the most recent commit
may not be a "safe" commit; this causes a replica shard not having a
safe commit point until it can retain a safe commit by itself.

This commits collapses the snapshot deletion policy into the combined
deletion policy and modifies the peer recovery source to send a safe
commit.

Relates #10708
dnhatn added a commit that referenced this pull request Jan 13, 2018
The global checkpoint should be assigned to unassigned rather than 0. If
a single document is indexed and the global checkpoint is initialized
with 0, the first commit is safe which the test does not suppose.

Relates #28038
dnhatn added a commit that referenced this pull request Jan 13, 2018
The global checkpoint should be assigned to unassigned rather than 0. If
a single document is indexed and the global checkpoint is initialized
with 0, the first commit is safe which the test does not suppose.

Relates #28038
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Jan 13, 2018
* master: (30 commits)
  Fix lock accounting in releasable lock
  Add ability to associate an ID with tasks  (elastic#27764)
  [DOCS] Removed differencies between text and code (elastic#27993)
  text fixes (elastic#28136)
  Update getting-started.asciidoc (elastic#28145)
  [Docs] Spelling fix in painless-getting-started.asciidoc (elastic#28187)
  Fixed the cat.health REST test to accept 4ms, not just 4.0ms (elastic#28186)
  Do not keep 5.x commits once having 6.x commits (elastic#28188)
  Rename core module to server (elastic#28180)
  upgraded jna from 4.4.0-1 to 4.5.1 (elastic#28183)
  [TEST] Do not call RandomizedTest.scaledRandomIntBetween from multiple threads
  Primary send safe commit in file-based recovery (elastic#28038)
  [Docs] Correct response json in rank-eval.asciidoc
  Add scroll parameter to _reindex API (elastic#28041)
  Include all sentences smaller than fragment_size in the unified highlighter (elastic#28132)
  Modifies the JavaAPI docs related to AggregationBuilder
  [Docs] Improvements in script-fields.asciidoc (elastic#28174)
  [Docs] Remove Kerberos/SPNEGO Shield plugin (elastic#28019)
  Ignore null value for range field (elastic#27845) (elastic#28116)
  Fix environment variable substitutions in list setting (elastic#28106)
  ...
matarrese added a commit to matarrese/elasticsearch that referenced this pull request Jan 13, 2018
* master: (59 commits)
  Correct backport replica rollback to 6.2 (elastic#28181)
  Backport replica rollback to 6.2 (elastic#28181)
  Rename deleteLocalTranslog to createNewTranslog
  AwaitsFix #testRecoveryAfterPrimaryPromotion
  TEST: init unassigned gcp in testAcquireIndexCommit
  Replica start peer recovery with safe commit (elastic#28181)
  Truncate tlog cli should assign global checkpoint (elastic#28192)
  Fix lock accounting in releasable lock
  Add ability to associate an ID with tasks  (elastic#27764)
  [DOCS] Removed differencies between text and code (elastic#27993)
  text fixes (elastic#28136)
  Update getting-started.asciidoc (elastic#28145)
  [Docs] Spelling fix in painless-getting-started.asciidoc (elastic#28187)
  Fixed the cat.health REST test to accept 4ms, not just 4.0ms (elastic#28186)
  Do not keep 5.x commits once having 6.x commits (elastic#28188)
  Rename core module to server (elastic#28180)
  upgraded jna from 4.4.0-1 to 4.5.1 (elastic#28183)
  [TEST] Do not call RandomizedTest.scaledRandomIntBetween from multiple threads
  Primary send safe commit in file-based recovery (elastic#28038)
  [Docs] Correct response json in rank-eval.asciidoc
  ...
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Jan 16, 2018
* compile-with-jdk-9: (56 commits)
  TEST: init unassigned gcp in testAcquireIndexCommit
  Replica start peer recovery with safe commit (elastic#28181)
  Truncate tlog cli should assign global checkpoint (elastic#28192)
  Fix lock accounting in releasable lock
  Add ability to associate an ID with tasks  (elastic#27764)
  [DOCS] Removed differencies between text and code (elastic#27993)
  text fixes (elastic#28136)
  Update getting-started.asciidoc (elastic#28145)
  [Docs] Spelling fix in painless-getting-started.asciidoc (elastic#28187)
  Fixed the cat.health REST test to accept 4ms, not just 4.0ms (elastic#28186)
  Do not keep 5.x commits once having 6.x commits (elastic#28188)
  Rename core module to server (elastic#28180)
  upgraded jna from 4.4.0-1 to 4.5.1 (elastic#28183)
  [TEST] Do not call RandomizedTest.scaledRandomIntBetween from multiple threads
  Primary send safe commit in file-based recovery (elastic#28038)
  [Docs] Correct response json in rank-eval.asciidoc
  Add scroll parameter to _reindex API (elastic#28041)
  Include all sentences smaller than fragment_size in the unified highlighter (elastic#28132)
  Modifies the JavaAPI docs related to AggregationBuilder
  [Docs] Improvements in script-fields.asciidoc (elastic#28174)
  ...
dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Jan 17, 2018
Previously we introduced a new parameter to `acquireIndexCommit` to
allow acquire either a safe commit or a last commit. However with the
new parameter callers can provide a nonsense combination - flush first
but acquire the safe commit. This commit separates acquireIndexCommit
method into two different methods to avoid that problem. Moreover, this
change should also improve the readability.

Follow-up elastic#28038
dnhatn added a commit that referenced this pull request Feb 17, 2018
Previously we introduced a new parameter to `acquireIndexCommit` to
allow acquire either a safe commit or a last commit. However with the
new parameters, callers can provide a nonsense combination - flush first
but acquire the safe commit. This commit separates acquireIndexCommit
method into two different methods to avoid that problem. Moreover, this
change should also improve the readability.

Relates #28038
dnhatn added a commit that referenced this pull request Feb 17, 2018
Previously we introduced a new parameter to `acquireIndexCommit` to
allow acquire either a safe commit or a last commit. However with the
new parameters, callers can provide a nonsense combination - flush first
but acquire the safe commit. This commit separates acquireIndexCommit
method into two different methods to avoid that problem. Moreover, this
change should also improve the readability.

Relates #28038
@jimczi jimczi added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Mar 1, 2021
- `NodeShouldNotConnectException` has not been instantiated since 5.0
- `GatewayException` has not been instantiated since 5.0
- `SnapshotFailedEngineException` has not been instantated since 6.2.0
  (elastic#28038) and was never thrown across clusters

This commit removes these obsolete exceptions.
DaveCTurner added a commit that referenced this pull request Mar 1, 2021
- `NodeShouldNotConnectException` has not been instantiated since 5.0
- `GatewayException` has not been instantiated since 5.0
- `SnapshotFailedEngineException` has not been instantated since 6.2.0
  (#28038) and was never thrown across clusters

This commit removes these obsolete exceptions.
DaveCTurner added a commit that referenced this pull request Mar 1, 2021
- `NodeShouldNotConnectException` has not been instantiated since 5.0
- `GatewayException` has not been instantiated since 5.0
- `SnapshotFailedEngineException` has not been instantated since 6.2.0
  (#28038) and was never thrown across clusters

This commit removes these obsolete exceptions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. >enhancement v6.2.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants