-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Segment Replication] Add new background task to fail stale replica shards. #6850
[Segment Replication] Add new background task to fail stale replica shards. #6850
Conversation
Signed-off-by: Rishikesh1159 <[email protected]>
Signed-off-by: Rishikesh1159 <[email protected]>
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Signed-off-by: Rishikesh1159 <[email protected]>
Gradle Check (Jenkins) Run Completed with:
|
Codecov Report
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more @@ Coverage Diff @@
## main #6850 +/- ##
============================================
- Coverage 70.78% 70.71% -0.07%
+ Complexity 59305 59255 -50
============================================
Files 4813 4822 +9
Lines 283781 283926 +145
Branches 40924 40947 +23
============================================
- Hits 200864 200788 -76
- Misses 66420 66637 +217
- Partials 16497 16501 +4
... and 481 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
entry.getValue().getReplicaStats() | ||
); | ||
for (SegmentReplicationShardStats staleReplica : staleReplicas) { | ||
if (staleReplica.getCurrentReplicationTimeMillis() > highestCurrentReplicationTimeMillis) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
highestCurrentReplicationTimeMillis
is never set to a value other than zero, is it? I think you can use the stream API, something like this:
stats.getShardStats().entrySet().stream()
.flatMap(entry -> pressureService.getStaleReplicas(entry.getValue().getReplicaStats()).stream()
.map(r -> Tuple.tuple(entry.getKey(), r.getCurrentReplicationTimeMillis())))
.max(Comparator.comparingLong(Tuple::v2))
.map(Tuple::v1);
.ifPresent(shardId -> {
...
});
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry you are right, I missed reassigning highestCurrentReplicationTimeMillis
in if condition. But Actually as you suggested using stream API makes this entire logic much cleaner. Thanks, I have updated the PR
Signed-off-by: Rishikesh1159 <[email protected]>
Gradle Check (Jenkins) Run Completed with:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Asking again, but why is the limit implemented as double the "max replication time limit" setting?
@@ -154,4 +182,95 @@ public void setMaxAllowedStaleReplicas(double maxAllowedStaleReplicas) { | |||
public void setMaxReplicationTime(TimeValue maxReplicationTime) { | |||
this.maxReplicationTime = maxReplicationTime; | |||
} | |||
|
|||
@Override | |||
public void close() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does this newly added method get called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, it is not called/used directly from anywhere. I took the reference from PersistentTasksClusterService, just to make sure when service is closed this async task is also closed.
Sorry I missed this question previously. The "max replication time limit"(MAX_REPLICATION_TIME_SETTING), is the maximum time a replica can take to catch up to primary shard without triggering backpressure mechanism. Once this MAX_REPLICATION_TIME_SETTING limit along with MAX_INDEXING_CHECKPOINTS hit limit we trigger the backpressure mechanism to kick in. Once backpressure mechanism kicks in we temporarily stop indexing/writes requests to primary shard so that the replica shards can catch up with all the previous checkpoints. Currently the default value of MAX_REPLICATION_TIME_SETTING is 5min, @mch2 came up with this number after running few benchmarks but users have the option to change this setting. Now, to answer your question of why the limit of failing replicas is double the "max replication time limit" setting?: Once backpressure mechanism kicks in we stop writes to primary shards, so replicas can catch up. So essentially if default of 5min is used, after backpressure kicks-in we give each replica 5min more to finish the replication, if it is not able to finish within given 5min then replica shard might be stuck (taking too long) so we just fail the shard and new shard can recover directly from primary and catch up. |
@Rishikesh1159 Great explanation, thanks! I suggest documenting this in the code, probably either the SegmentReplicationPressureService classdoc or on the
|
Sure I will add this code doc in next commit. Thanks @andrross for your review on this PR. |
Signed-off-by: Rishikesh1159 <[email protected]>
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-6850-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 59e881b705c29b7bf809740e918955349942e397
# Push it to GitHub
git push --set-upstream origin backport/backport-6850-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.x Then, create a pull request where the |
…hards. (opensearch-project#6850) * Add new background task to fail stale replica shards. Signed-off-by: Rishikesh1159 <[email protected]> * Add condition to check if backpressure is enabled. Signed-off-by: Rishikesh1159 <[email protected]> * Fix failing tests. Signed-off-by: Rishikesh1159 <[email protected]> * Fix failing tests by adding manual refresh. Signed-off-by: Rishikesh1159 <[email protected]> * Address comments on PR. Signed-off-by: Rishikesh1159 <[email protected]> * Addressing comments on PR. Signed-off-by: Rishikesh1159 <[email protected]> * Update background task logic to fail stale replicas of only one shardId's in a single iteration of background task. Signed-off-by: Rishikesh1159 <[email protected]> * Fix failing import. Signed-off-by: Rishikesh1159 <[email protected]> * Address comments. Signed-off-by: Rishikesh1159 <[email protected]> * Add code doc to SEGMENT_REPLICATION_INDEXING_PRESSURE_ENABLED setting. Signed-off-by: Rishikesh1159 <[email protected]> --------- Signed-off-by: Rishikesh1159 <[email protected]> Signed-off-by: Valentin Mitrofanov <[email protected]>
@Rishikesh1159 : Looks like backport workflow has some trouble with backporting. Care to raise manual backport ? |
Description
-> This PR adds new async task to send remote shard failure for lagging replica shard.
-> This Works only with Segment Replication backpressure enabled.
-> This Async task fails any replica shards which are stale (few checkpoints behind primary shard) and having current replication time twice the max replication time limit.
Issues Resolved
Resolves #6606
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.