-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Primary send safe commit in file-based recovery #28038
Primary send safe commit in file-based recovery #28038
Conversation
Today a primary shard transfers the most recent commit to a replica shard in a file-based recovery. This causes replica shards not having a safe commit point until it can retain a safe commit by itself. This commits collapses the snapshot deletion policy into the combined deletion policy and modifies primary shards to send a safe commit.
# Conflicts: # test/framework/src/main/java/org/elasticsearch/index/shard/IndexShardTestCase.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it. I made a quick pass with initial comments.
assert lastCommit != null : "Last commit is not initialized yet"; | ||
final IndexCommit snapshotting = acquiringSafeCommit ? safeCommit : lastCommit; | ||
snapshottedCommits.addTo(snapshotting, 1); // increase refCount | ||
return new Engine.IndexCommitRef(snapshotting, () -> releaseCommit(snapshotting)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we leave this to the engine and make the releaseCommit method public?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can also return a commit wrapper that disable the delete method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
assert refCount >= 0 : "Number of snapshots can not be negative [" + refCount + "]"; | ||
if (refCount == 0) { | ||
snapshottedCommits.remove(releasingCommit); | ||
updateTranslogDeletionPolicy(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does the translog care about the snapshotting situation? it only cares about the safe commit, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assume that we have two commits c1
and c2
with c1
is the safe commit and c2
is the last commit. Clients acquire c1
, then we have to keep translog of c1
until they release the commit. During that time, we have a new commit (or global checkpoint advanced), we can release c1
and its translog but have to keep them as they are being snapshotted. When clients release the commit c1
, we should also release its translog rather than wait until the next onCommit
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should not release translog here. We should either keep or release both the commit and translog at the same time. I will update this.
* @param flushFirst indicates whether the engine should flush before returning the snapshot | ||
*/ | ||
public abstract IndexCommitRef acquireIndexCommit(boolean flushFirst) throws EngineException; | ||
public abstract IndexCommitRef acquireIndexCommit(boolean safeCommit, boolean flushFirst) throws EngineException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should look in future to remove one of these booleans - either you want to "get everything" or you want a safe commit. I don't think there's a point in "flush but give me a safe commit"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Are you ok if we replace this by having two methods: acquireLastIndexCommit(flushFirst)
and acquireSafeIndexCommit(no option)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree. Are you ok if we replace this by having two methods: acquireLastIndexCommit(flushFirst) and acquireSafeIndexCommit(no option)?
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double checking - the new methods will be a follow up?
@bleskes, I've updated the deletion policy to return a non-deletable commit and wrap it into a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking good. I left some more minor feedback
* Releases an index commit that acquired by {@link #acquireIndexCommit(boolean)}. | ||
*/ | ||
synchronized void releaseCommit(final IndexCommit releasingCommit) { | ||
assert snapshottedCommits.containsKey(releasingCommit) : "Release non-snapshotted commit;" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does that work? releasingCommit is a SnapshotIndexCommit ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The snapshotting commits are stored as keys in a HashMap. Both SnapshotIndexCommit
and regular index commit inherit equals
and hashCode
from the root IndexCommit
, thus they are interchangeable. This can be problematic if a regular index commit overrides equals
or hashCode
.
I see some options to avoid this.
- Exposes
SnapshotIndexCommit
to package level; makes acquireCommit and releaseCommit withSnapshotIndexCommit
type. - Delegate
hashCode
andequals
ofSnapshotIndexCommit
to the original index commit.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I think it's risky as the underlying IndexCommit may have a different implementation (it's not final). I see a 3rd option - we could use an identity map and sure people only release index commits they got from us. Until we need the ability to work all kind of crazyness like wrapped IndexCommits, I prefer to keep things strict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lucene's SnapshotDeletionPolicy identifies the IndexCommit's based on their generation (long field).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed and agreed to keep the current implementation.
logger.trace("pulling snapshot"); | ||
return new IndexCommitRef(snapshotDeletionPolicy); | ||
} catch (IOException e) { | ||
throw new SnapshotFailedEngineException(shardId, e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means a potential change to the exception type. Can you double check it's OK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be ok. The method snapshot
of Lucene's SnapshotDeletionPolicy
throws IOException but we don't. Acquiring a commit is just increasing refCount of a commit.
final IndexCommit ref2 = indexPolicy.acquireIndexCommit(false); | ||
assertThat(ref2, equalTo(c2)); | ||
expectThrows(UnsupportedOperationException.class, ref2::delete); | ||
assertThat(translogPolicy.getMinTranslogGenerationForRecovery(), lessThanOrEqualTo(100L)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be exactly translogGen1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol. I fixed
assertThat(ref3, equalTo(c2)); | ||
assertThat(translogPolicy.getMinTranslogGenerationForRecovery(), equalTo(translogGen1)); | ||
indexPolicy.releaseCommit(ref1); // release acquired commit releases translog and commit | ||
indexPolicy.onCommit(Arrays.asList(c1, c2)); // Flush new commit deletes c1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we also release c2 and see that it is not deleted? I would also appreciate some randomness here - i.e., a series of commits, choose snapshot at random times. Release at random order and check that all is OK.
Thx.
…On Wed, Jan 10, 2018 at 4:09 PM, Nhat Nguyen ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java
<#28038 (comment)>
:
> // we have to flush outside of the readlock otherwise we might have a problem upgrading
// the to a write lock when we fail the engine in this operation
if (flushFirst) {
logger.trace("start flush for snapshot");
flush(false, true);
logger.trace("finish flush for snapshot");
}
- try (ReleasableLock lock = readLock.acquire()) {
- logger.trace("pulling snapshot");
- return new IndexCommitRef(snapshotDeletionPolicy);
- } catch (IOException e) {
- throw new SnapshotFailedEngineException(shardId, e);
This should be ok. The method snapshot of Lucene's SnapshotDeletionPolicy
throws IOException but we don't. Acquiring a commit is just increasing
refCount of a commit.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#28038 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA9bJ8XVLnNrL1EiZiS2HJctvgoiFn2uks5tJNKmgaJpZM4RP4GN>
.
|
This reverts commit 0285001.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left two suggestions
@@ -42,12 +46,16 @@ | |||
private final TranslogDeletionPolicy translogDeletionPolicy; | |||
private final EngineConfig.OpenMode openMode; | |||
private final LongSupplier globalCheckpointSupplier; | |||
private final ObjectIntHashMap<IndexCommit> snapshottedCommits; // Number of snapshots held against each commit point. | |||
private IndexCommit safeCommit; // the most recent safe commit point - its max_seqno at most the persisted global checkpoint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The safe commit point is a constantly moving target (as the global checkpoint keeps going up and new commits are being added). I wonder if it's nicer to calculate the safe commit point when it's accessed in acquireIndexCommit
, based on the then current globalcheckpoint and the current list of commits (This will require storing the last seen indexCommits, but that would be equivalent to what's being done in Lucene's SnapshotDeletionPolicy).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ywelsch Nhat has a follow up to trim unneeded commits as soon as the global checkpoint advances enough. This will have a side effect you mention (it will update things) with some added value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but that involves some extra machinery to update safeCommit at the right points in time (e.g. when gcp advances)? My suggestion here makes that unnecessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think the added value of the approach in the follow-up is that the clean-up logic is also possibly more eagerly invoked (when gcp advances).
* Releases an index commit that acquired by {@link #acquireIndexCommit(boolean)}. | ||
*/ | ||
synchronized void releaseCommit(final IndexCommit releasingCommit) { | ||
assert snapshottedCommits.containsKey(releasingCommit) : "Release non-snapshotted commit;" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lucene's SnapshotDeletionPolicy identifies the IndexCommit's based on their generation (long field).
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @dnhatn
final int keptPosition = indexOfKeptCommits(commits, globalCheckpointSupplier.getAsLong()); | ||
lastCommit = commits.get(commits.size() - 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we assert that the translog gen is in this commit is lower than all the ones in higher commits?
* @param flushFirst indicates whether the engine should flush before returning the snapshot | ||
*/ | ||
public abstract IndexCommitRef acquireIndexCommit(boolean flushFirst) throws EngineException; | ||
public abstract IndexCommitRef acquireIndexCommit(boolean safeCommit, boolean flushFirst) throws EngineException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double checking - the new methods will be a follow up?
Currently we keep a 5.x index commit as a safe commit until we have a 6.x safe commit. During that time, if peer-recovery happens, a primary will send a 5.x commit in file-based sync and the recovery will even fail as the snapshotted commit does not have sequence number tags. This commit updates the combined deletion policy to delete legacy commits if there are 6.x commits. Relates elastic#27606 Relates elastic#28038
* master: (43 commits) Rename core module to server (#28180) upgraded jna from 4.4.0-1 to 4.5.1 (#28183) [TEST] Do not call RandomizedTest.scaledRandomIntBetween from multiple threads Primary send safe commit in file-based recovery (#28038) [Docs] Correct response json in rank-eval.asciidoc Add scroll parameter to _reindex API (#28041) Include all sentences smaller than fragment_size in the unified highlighter (#28132) Modifies the JavaAPI docs related to AggregationBuilder [Docs] Improvements in script-fields.asciidoc (#28174) [Docs] Remove Kerberos/SPNEGO Shield plugin (#28019) Ignore null value for range field (#27845) (#28116) Fix environment variable substitutions in list setting (#28106) docs: Replaces indexed script java api docs with stored script api docs test: ensure we endup with a single segment Make sure that we don't detect files as maven coordinate when installing a plugin (#28163) [Tests] temporary disable meta plugin rest tests #28163 meta-plugin should install bin and config at the top level (#28162) Painless: Add public member read/write access test. (#28156) Docs: Clarify password protection support with keystore (#28157) [Docs] fix plugin properties inclusion for plugins authors ...
Currently we keep a 5.x index commit as a safe commit until we have a 6.x safe commit. During that time, if peer-recovery happens, a primary will send a 5.x commit in file-based sync and the recovery will even fail as the snapshotted commit does not have sequence number tags. This commit updates the combined deletion policy to delete legacy commits if there are 6.x commits. Relates #27606 Relates #28038
Currently we keep a 5.x index commit as a safe commit until we have a 6.x safe commit. During that time, if peer-recovery happens, a primary will send a 5.x commit in file-based sync and the recovery will even fail as the snapshotted commit does not have sequence number tags. This commit updates the combined deletion policy to delete legacy commits if there are 6.x commits. Relates #27606 Relates #28038
If a 6.x node with a 5.x index is promoted to be a primary, it will flush a new index commit to make sure translog operations without seqno will never be replayed (see IndexShard#updateShardState). However the global checkpoint is still UNASSIGNED and the max_seqno of both commits are NO_OPS_PERFORMED. If the combined deletion policy considers the first commit as a safe commit, we will send the first commit without replaying translog between these commits to the replica in a peer-recovery. This causes the replica missing those operations. To prevent this, we should not keep more than one commit whose max_seqno is NO_OPS_PERFORMED. Once we can retain a safe commit, a NO_OPS_PERFORMED commit will be deleted just as other commits. Relates #28038
Today a primary shard transfers the most recent commit point to a replica shard in a file-based recovery. However, the most recent commit may not be a "safe" commit; this causes a replica shard not having a safe commit point until it can retain a safe commit by itself. This commits collapses the snapshot deletion policy into the combined deletion policy and modifies the peer recovery source to send a safe commit. Relates #10708
The global checkpoint should be assigned to unassigned rather than 0. If a single document is indexed and the global checkpoint is initialized with 0, the first commit is safe which the test does not suppose. Relates #28038
The global checkpoint should be assigned to unassigned rather than 0. If a single document is indexed and the global checkpoint is initialized with 0, the first commit is safe which the test does not suppose. Relates #28038
* master: (30 commits) Fix lock accounting in releasable lock Add ability to associate an ID with tasks (elastic#27764) [DOCS] Removed differencies between text and code (elastic#27993) text fixes (elastic#28136) Update getting-started.asciidoc (elastic#28145) [Docs] Spelling fix in painless-getting-started.asciidoc (elastic#28187) Fixed the cat.health REST test to accept 4ms, not just 4.0ms (elastic#28186) Do not keep 5.x commits once having 6.x commits (elastic#28188) Rename core module to server (elastic#28180) upgraded jna from 4.4.0-1 to 4.5.1 (elastic#28183) [TEST] Do not call RandomizedTest.scaledRandomIntBetween from multiple threads Primary send safe commit in file-based recovery (elastic#28038) [Docs] Correct response json in rank-eval.asciidoc Add scroll parameter to _reindex API (elastic#28041) Include all sentences smaller than fragment_size in the unified highlighter (elastic#28132) Modifies the JavaAPI docs related to AggregationBuilder [Docs] Improvements in script-fields.asciidoc (elastic#28174) [Docs] Remove Kerberos/SPNEGO Shield plugin (elastic#28019) Ignore null value for range field (elastic#27845) (elastic#28116) Fix environment variable substitutions in list setting (elastic#28106) ...
* master: (59 commits) Correct backport replica rollback to 6.2 (elastic#28181) Backport replica rollback to 6.2 (elastic#28181) Rename deleteLocalTranslog to createNewTranslog AwaitsFix #testRecoveryAfterPrimaryPromotion TEST: init unassigned gcp in testAcquireIndexCommit Replica start peer recovery with safe commit (elastic#28181) Truncate tlog cli should assign global checkpoint (elastic#28192) Fix lock accounting in releasable lock Add ability to associate an ID with tasks (elastic#27764) [DOCS] Removed differencies between text and code (elastic#27993) text fixes (elastic#28136) Update getting-started.asciidoc (elastic#28145) [Docs] Spelling fix in painless-getting-started.asciidoc (elastic#28187) Fixed the cat.health REST test to accept 4ms, not just 4.0ms (elastic#28186) Do not keep 5.x commits once having 6.x commits (elastic#28188) Rename core module to server (elastic#28180) upgraded jna from 4.4.0-1 to 4.5.1 (elastic#28183) [TEST] Do not call RandomizedTest.scaledRandomIntBetween from multiple threads Primary send safe commit in file-based recovery (elastic#28038) [Docs] Correct response json in rank-eval.asciidoc ...
* compile-with-jdk-9: (56 commits) TEST: init unassigned gcp in testAcquireIndexCommit Replica start peer recovery with safe commit (elastic#28181) Truncate tlog cli should assign global checkpoint (elastic#28192) Fix lock accounting in releasable lock Add ability to associate an ID with tasks (elastic#27764) [DOCS] Removed differencies between text and code (elastic#27993) text fixes (elastic#28136) Update getting-started.asciidoc (elastic#28145) [Docs] Spelling fix in painless-getting-started.asciidoc (elastic#28187) Fixed the cat.health REST test to accept 4ms, not just 4.0ms (elastic#28186) Do not keep 5.x commits once having 6.x commits (elastic#28188) Rename core module to server (elastic#28180) upgraded jna from 4.4.0-1 to 4.5.1 (elastic#28183) [TEST] Do not call RandomizedTest.scaledRandomIntBetween from multiple threads Primary send safe commit in file-based recovery (elastic#28038) [Docs] Correct response json in rank-eval.asciidoc Add scroll parameter to _reindex API (elastic#28041) Include all sentences smaller than fragment_size in the unified highlighter (elastic#28132) Modifies the JavaAPI docs related to AggregationBuilder [Docs] Improvements in script-fields.asciidoc (elastic#28174) ...
Previously we introduced a new parameter to `acquireIndexCommit` to allow acquire either a safe commit or a last commit. However with the new parameter callers can provide a nonsense combination - flush first but acquire the safe commit. This commit separates acquireIndexCommit method into two different methods to avoid that problem. Moreover, this change should also improve the readability. Follow-up elastic#28038
Previously we introduced a new parameter to `acquireIndexCommit` to allow acquire either a safe commit or a last commit. However with the new parameters, callers can provide a nonsense combination - flush first but acquire the safe commit. This commit separates acquireIndexCommit method into two different methods to avoid that problem. Moreover, this change should also improve the readability. Relates #28038
Previously we introduced a new parameter to `acquireIndexCommit` to allow acquire either a safe commit or a last commit. However with the new parameters, callers can provide a nonsense combination - flush first but acquire the safe commit. This commit separates acquireIndexCommit method into two different methods to avoid that problem. Moreover, this change should also improve the readability. Relates #28038
- `NodeShouldNotConnectException` has not been instantiated since 5.0 - `GatewayException` has not been instantiated since 5.0 - `SnapshotFailedEngineException` has not been instantated since 6.2.0 (elastic#28038) and was never thrown across clusters This commit removes these obsolete exceptions.
- `NodeShouldNotConnectException` has not been instantiated since 5.0 - `GatewayException` has not been instantiated since 5.0 - `SnapshotFailedEngineException` has not been instantated since 6.2.0 (#28038) and was never thrown across clusters This commit removes these obsolete exceptions.
- `NodeShouldNotConnectException` has not been instantiated since 5.0 - `GatewayException` has not been instantiated since 5.0 - `SnapshotFailedEngineException` has not been instantated since 6.2.0 (#28038) and was never thrown across clusters This commit removes these obsolete exceptions.
Today a primary shard transfers the most recent commit point to a replica shard in a file-based recovery. However, the most recent commit may not be a "safe" commit; this causes a replica shard not having a safe commit point until it can retain a safe commit by itself.
This commits collapses the snapshot deletion policy into the combined deletion policy and modifies the peer recovery source to send a safe commit.
Relates #10708