Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix assertion failure in updateGlobalCheckpointOnReplica() with remote translog #6975

Merged
merged 1 commit into from
Apr 4, 2023

Conversation

sachinpkale
Copy link
Member

@sachinpkale sachinpkale commented Apr 4, 2023

Description

  • When remote translog is enabled for an index, replication operation is limited to primary term validation and does not update local checkpoint at replica, so the local checkpoint at replica can be less than globalCheckpoint.
  • This fails the assertion at following code:

if (globalCheckpoint > localCheckpoint) {
/*
* This can happen during recovery when the shard has started its engine but recovery is not finalized and is receiving global
* checkpoint updates. However, since this shard is not yet contributing to calculating the global checkpoint, it can be the
* case that the global checkpoint update from the primary is ahead of the local checkpoint on this shard. In this case, we
* ignore the global checkpoint update. This can happen if we are in the translog stage of recovery. Prior to this, the engine
* is not opened and this shard will not receive global checkpoint updates, and after this the shard will be contributing to
* calculations of the global checkpoint. However, we can not assert that we are in the translog stage of recovery here as
* while the global checkpoint update may have emanated from the primary when we were in that state, we could subsequently move
* to recovery finalization, or even finished recovery before the update arrives here.
*/
assert state() != IndexShardState.POST_RECOVERY && state() != IndexShardState.STARTED
: "supposedly in-sync shard copy received a global checkpoint ["
+ globalCheckpoint
+ "] "
+ "that is higher than its local checkpoint ["
+ localCheckpoint
+ "]";

Issues Resolved

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

… when remote translog is enabled

Signed-off-by: Sachin Kale <[email protected]>
@github-actions
Copy link
Contributor

github-actions bot commented Apr 4, 2023

Gradle Check (Jenkins) Run Completed with:

@codecov-commenter
Copy link

Codecov Report

Merging #6975 (05d9df3) into main (285b450) will increase coverage by 0.05%.
The diff coverage is 0.00%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@             Coverage Diff              @@
##               main    #6975      +/-   ##
============================================
+ Coverage     70.67%   70.73%   +0.05%     
- Complexity    59206    59238      +32     
============================================
  Files          4812     4812              
  Lines        283748   283749       +1     
  Branches      40916    40917       +1     
============================================
+ Hits         200527   200697     +170     
+ Misses        66715    66527     -188     
- Partials      16506    16525      +19     
Impacted Files Coverage Δ
...in/java/org/opensearch/index/shard/IndexShard.java 70.23% <0.00%> (+0.01%) ⬆️

... and 521 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@gbbafna gbbafna merged commit 95c6ed9 into opensearch-project:main Apr 4, 2023
@gbbafna gbbafna added the backport 2.x Backport to 2.x branch label Apr 4, 2023
opensearch-trigger-bot bot pushed a commit that referenced this pull request Apr 4, 2023
… when remote translog is enabled (#6975)

Signed-off-by: Sachin Kale <[email protected]>
(cherry picked from commit 95c6ed9)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
gbbafna pushed a commit that referenced this pull request Apr 5, 2023
… when remote translog is enabled (#6975) (#6978)

(cherry picked from commit 95c6ed9)

Signed-off-by: Sachin Kale <[email protected]>
mitrofmep pushed a commit to mitrofmep/OpenSearch that referenced this pull request Apr 5, 2023
… when remote translog is enabled (opensearch-project#6975)

Signed-off-by: Sachin Kale <[email protected]>
Signed-off-by: Valentin Mitrofanov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch skip-changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants