Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Renewing retention lease with global-ckp + 1 , as we want operations … #206

Merged
merged 1 commit into from
Oct 26, 2021

Conversation

gbbafna
Copy link
Collaborator

@gbbafna gbbafna commented Oct 20, 2021

…from that seq number

Signed-off-by: Gaurav Bafna [email protected]

Description

Retention leases preserve the operations including and starting from the retainingSequenceNumber we specify when we take the lease .

Hence for follower side, once the global checkpoint moves to x, we should renew the lease from x+1 as retainingSequenceNumber .

The current code tries to take lease from x , which is not necessary . This can also become problematic when the earlier retention lease taken by bootstrap is x+1 which is observed in issue

Issues Resolved

#205

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@@ -268,7 +268,7 @@ class ShardReplicationTask(id: Long, type: String, action: String, description:

//renew retention lease with global checkpoint so that any shard that picks up shard replication task has data until then.
try {
retentionLeaseHelper.renewRetentionLease(leaderShardId, indexShard.lastSyncedGlobalCheckpoint, followerShardId)
retentionLeaseHelper.renewRetentionLease(leaderShardId, indexShard.lastSyncedGlobalCheckpoint + 1, followerShardId)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you describe the rationale in description and also leave a comment why it is +1?

Also can you share details of testing? Is there a functional test we can create that would fail with earlier code and it passes now?

Copy link
Collaborator Author

@gbbafna gbbafna Oct 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have used existing UTs and did a sanity check with index having million of documents in bootstrap and post bootstrap as well. It is difficult to write a functional test that would fail with earlier code as the issue is not reproducible every time. Besides that, I have not been able to repro this locally .

@ankitkala
Copy link
Member

If we're going with this, I think we can remove the error suppression based on leaseRenewalMaxFailureDuration

Copy link
Member

@saikaranam-amazon saikaranam-amazon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@saikaranam-amazon
Copy link
Member

saikaranam-amazon commented Oct 26, 2021

Can you please backport to 7.10 and 1.0 branches as well?

@gbbafna gbbafna merged commit 12fe6b5 into opensearch-project:main Oct 26, 2021
gbbafna added a commit to gbbafna/cross-cluster-replication that referenced this pull request Oct 26, 2021
gbbafna added a commit to gbbafna/cross-cluster-replication that referenced this pull request Oct 26, 2021
gbbafna added a commit that referenced this pull request Oct 26, 2021
gbbafna added a commit that referenced this pull request Oct 26, 2021
@gbbafna
Copy link
Collaborator Author

gbbafna commented Oct 26, 2021

Can you please backport to 7.10 and 1.0 branches as well?

have backported to both the branches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants