Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug fixes: leaderCheckpoint initialisation and retention lease issue #904

Merged
merged 1 commit into from
Jun 2, 2023

Conversation

ankitkala
Copy link
Member

@ankitkala ankitkala commented May 29, 2023

Description

  • Initialise the leaderCheckpoint when ShardReplicationTask restarts on a new node.
  • Correctly handle retention lease renewal (if the lease already exists) during bootstrap.

I've added the Integ Test for retention lease issue. For leaderCheckpoint, I tried adding a IT to simulate the test scenario using shard reroute but wasn't able to make it work.

Issues Resolved

#900
#882

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@ankitkala ankitkala changed the title Initialize the leaderCheckpoint with follower shard's localCheckpoint Bug fixes: leaderCheckpoint initialisation and retention lease issue May 29, 2023
@codecov
Copy link

codecov bot commented May 29, 2023

Codecov Report

Merging #904 (5042bd9) into main (f5c94f7) will increase coverage by 26.83%.
The diff coverage is 70.58%.

❗ Current head 5042bd9 differs from pull request most recent head 236c860. Consider uploading reports for the commit 236c860 to get more accurate results

@@              Coverage Diff              @@
##               main     #904       +/-   ##
=============================================
+ Coverage     46.21%   73.05%   +26.83%     
- Complexity      630     1014      +384     
=============================================
  Files           141      141               
  Lines          4695     4717       +22     
  Branches        527      528        +1     
=============================================
+ Hits           2170     3446     +1276     
+ Misses         2266      942     -1324     
- Partials        259      329       +70     
Impacted Files Coverage Δ
...replication/task/shard/ShardReplicationExecutor.kt 59.37% <57.14%> (-1.34%) ⬇️
...ication/seqno/RemoteClusterRetentionLeaseHelper.kt 66.08% <69.56%> (+32.41%) ⬆️
...on/repository/RemoteClusterRestoreLeaderService.kt 78.37% <100.00%> (ø)
...rch/replication/task/index/IndexReplicationTask.kt 68.77% <100.00%> (+29.11%) ⬆️
...rch/replication/task/shard/ShardReplicationTask.kt 76.08% <100.00%> (+16.96%) ⬆️

... and 70 files with indirect coverage changes

@ankitkala ankitkala marked this pull request as ready for review May 31, 2023 10:24
@ankitkala ankitkala requested a review from gbbafna as a code owner May 31, 2023 10:25
@@ -290,7 +290,7 @@ open class IndexReplicationTask(id: Long, type: String, action: String, descript
private suspend fun pollShardTaskStatus(): IndexReplicationState {
val failedShardTasks = findAllReplicationFailedShardTasks(followerIndexName, clusterService.state())
if (failedShardTasks.isNotEmpty()) {
log.info("Failed shard tasks - ", failedShardTasks)
log.info("Failed shard tasks - {}", failedShardTasks)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let's directly use ${shardtasks}

// was at higher sequence number than the minRetainedSequenceNumber(i.e. RETAIN_ALL).
// To get around this, we always renew the existing retention lease with same sequence number to increase the
// expiry time.
val retainedSequenceNumber = leaderIndexShard.retentionLeases.leases().filter { lease ->
Copy link
Member

@saikaranam-amazon saikaranam-amazon May 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we consider the case for expired leases?
Rather than doing this, can we get the seqno from the commitId and use that for renewal ?
This ensures that, we are not deviating from the underlying assumption for "SYNCING" phase

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If i use the seqNo from commitId, then technically that can also fail with RetentionLeaseInvalidRetainingSeqNoException.

The ideal way to do would be something like:

if (commitSeqNo > retainedSeqNo){
 //use commitSeqNo
} else {
  // use retainedSeqNo
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think, we should let it fail or remove the lease and add it again.
In case, we don't follow above, we can risk of loosing the operations b/w the seqno obtained from the commit-id and current seqno(which will be higher) tracked by the current lease.
Thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to simplify this, i've removed the lease renewal as fallback. We'll simply add a lease. If lease exists, we clear the old one and retry.

I've also used RETAIN_ALL to keep parity with the existing code..

@opensearch-trigger-bot
Copy link

The backport to 1.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-1.x 1.x
# Navigate to the new working tree
cd .worktrees/backport-1.x
# Create a new branch
git switch --create backport/backport-904-to-1.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 ba7d1fa303299dfd874349ee9bc117fae5b2157c
# Push it to GitHub
git push --set-upstream origin backport/backport-904-to-1.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-1.x

Then, create a pull request where the base branch is 1.x and the compare/head branch is backport/backport-904-to-1.x.

@opensearch-trigger-bot
Copy link

The backport to 2.7 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.7 2.7
# Navigate to the new working tree
cd .worktrees/backport-2.7
# Create a new branch
git switch --create backport/backport-904-to-2.7
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 ba7d1fa303299dfd874349ee9bc117fae5b2157c
# Push it to GitHub
git push --set-upstream origin backport/backport-904-to-2.7
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.7

Then, create a pull request where the base branch is 2.7 and the compare/head branch is backport/backport-904-to-2.7.

@opensearch-trigger-bot
Copy link

The backport to 1.1 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-1.1 1.1
# Navigate to the new working tree
cd .worktrees/backport-1.1
# Create a new branch
git switch --create backport/backport-904-to-1.1
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 ba7d1fa303299dfd874349ee9bc117fae5b2157c
# Push it to GitHub
git push --set-upstream origin backport/backport-904-to-1.1
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-1.1

Then, create a pull request where the base branch is 1.1 and the compare/head branch is backport/backport-904-to-1.1.

@opensearch-trigger-bot
Copy link

The backport to 1.2 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-1.2 1.2
# Navigate to the new working tree
cd .worktrees/backport-1.2
# Create a new branch
git switch --create backport/backport-904-to-1.2
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 ba7d1fa303299dfd874349ee9bc117fae5b2157c
# Push it to GitHub
git push --set-upstream origin backport/backport-904-to-1.2
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-1.2

Then, create a pull request where the base branch is 1.2 and the compare/head branch is backport/backport-904-to-1.2.

@opensearch-trigger-bot
Copy link

The backport to 2.3 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.3 2.3
# Navigate to the new working tree
cd .worktrees/backport-2.3
# Create a new branch
git switch --create backport/backport-904-to-2.3
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 ba7d1fa303299dfd874349ee9bc117fae5b2157c
# Push it to GitHub
git push --set-upstream origin backport/backport-904-to-2.3
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.3

Then, create a pull request where the base branch is 2.3 and the compare/head branch is backport/backport-904-to-2.3.

@opensearch-trigger-bot
Copy link

The backport to 2.5 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.5 2.5
# Navigate to the new working tree
cd .worktrees/backport-2.5
# Create a new branch
git switch --create backport/backport-904-to-2.5
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 ba7d1fa303299dfd874349ee9bc117fae5b2157c
# Push it to GitHub
git push --set-upstream origin backport/backport-904-to-2.5
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.5

Then, create a pull request where the base branch is 2.5 and the compare/head branch is backport/backport-904-to-2.5.

@opensearch-trigger-bot
Copy link

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-904-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 ba7d1fa303299dfd874349ee9bc117fae5b2157c
# Push it to GitHub
git push --set-upstream origin backport/backport-904-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-904-to-2.x.

@opensearch-trigger-bot
Copy link

The backport to 1.3 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-1.3 1.3
# Navigate to the new working tree
cd .worktrees/backport-1.3
# Create a new branch
git switch --create backport/backport-904-to-1.3
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 ba7d1fa303299dfd874349ee9bc117fae5b2157c
# Push it to GitHub
git push --set-upstream origin backport/backport-904-to-1.3
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-1.3

Then, create a pull request where the base branch is 1.3 and the compare/head branch is backport/backport-904-to-1.3.

ankitkala added a commit to ankitkala/cross-cluster-replication that referenced this pull request Jun 2, 2023
ankitkala added a commit to ankitkala/cross-cluster-replication that referenced this pull request Jun 2, 2023
ankitkala added a commit to ankitkala/cross-cluster-replication that referenced this pull request Jun 2, 2023
ankitkala added a commit to ankitkala/cross-cluster-replication that referenced this pull request Jun 2, 2023
ankitkala added a commit that referenced this pull request Jun 2, 2023
ankitkala added a commit that referenced this pull request Jun 2, 2023
ankitkala added a commit that referenced this pull request Jun 2, 2023
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jun 2, 2023
ankitkala added a commit to ankitkala/cross-cluster-replication that referenced this pull request Jun 2, 2023
ankitkala added a commit to ankitkala/cross-cluster-replication that referenced this pull request Jun 2, 2023
ankitkala added a commit to ankitkala/cross-cluster-replication that referenced this pull request Jun 2, 2023
ankitkala added a commit to ankitkala/cross-cluster-replication that referenced this pull request Jun 2, 2023
ankitkala added a commit to ankitkala/cross-cluster-replication that referenced this pull request Jun 2, 2023
ankitkala added a commit that referenced this pull request Jun 2, 2023
ankitkala added a commit that referenced this pull request Jun 2, 2023
ankitkala added a commit that referenced this pull request Jun 2, 2023
ankitkala added a commit that referenced this pull request Jun 2, 2023
ankitkala added a commit that referenced this pull request Jun 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants