-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactoring checkpoint publishing to no longer be a Broadcast Action #2591
Refactoring checkpoint publishing to no longer be a Broadcast Action #2591
Conversation
❌ Gradle Check failure 053ea1498474eec1a475e0e897815cb3b06cd078 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nits but otherwise lgtm. Thanks for doing this!
@@ -38,8 +38,8 @@ public void beforeRefresh() throws IOException { | |||
|
|||
@Override | |||
public void afterRefresh(boolean didRefresh) throws IOException { | |||
if (shard.routingEntry().primary()) { | |||
publisher.publish(shard.getLatestReplicationCheckpoint()); | |||
if (shard.routingEntry().primary() && shard.indexSettings().isSegrepEnabled()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We conditionally wire up the refreshListener in IndexShard if segrep is enabled & the shard is a primary, I'm not sure we need these checks anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, i can remove this. Follow up question - should we be checking didRefresh
here before publishing the checkpoint ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call, yes I think we should. Otherwise we could make multiple calls for the same ReplicationCheckpoint.
server/src/internalClusterTest/java/org/opensearch/index/shard/IndexShardIT.java
Show resolved
Hide resolved
This change adopts the same methodology as RetentionLeaseSyncer. A Transport client action is not required since we can skip the step of routing to the primary shard (the checkpoitn publisher is guaranteed to fire only from the primary). Now, checkpoint publishing directly invokes a TransportReplicationAction to perform the operation on the primary and replicas. PublishCheckpointAction has been reworked to be the TransportReplicationAction implementation rather than an ActionType. We leverage dependency injection to create the checkpoint publisher (and its internal action) at IndicesClusterStateService. This is plumed through to IndexShard which creates the refresh listener instance. All other transport layer classes tied to the original broadcast action are no longer required. Unrelated integration tests use a no-op/empty checkpoint publisher to satisfy their constructor/method argument. Signed-off-by: Kartik Ganesh <[email protected]>
053ea14
to
03bb12d
Compare
|
||
@Override | ||
protected void doExecute(Task task, PublishCheckpointRequest request, ActionListener<ReplicationResponse> listener) { | ||
assert false : "use PublishCheckpointAction#publish"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of this function if it throws an assertion error everytime?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the implementation here isn't the clearest. What's going on is:
- To allow a primary shard to publish the checkpoint to its replicas, we're leveraging the ReplicationOperation functionality of executing on a primary followed by its replicas.
- In order to run a ReplicationOperation, we need to register an action name with the transport layer. TransportReplicationAction (the superclass here) does this in its constructor
- However, TransportReplicationAction also assumes that it will be called via execute/doExecute, which includes rerouting behavior. We don't need the reroute phase here since we are guaranteed to be on the primary shard
- Thus, the purpose of this implementation is to guard against accidental execution via the above code path. Instead, the refresh listener invokes the publish API directly
Description
This change adopts the same methodology as
RetentionLeaseSyncer
. A Transport client action is not required since we can skip the step of routing to the primary shard (the checkpoint publisher is guaranteed to fire only from the primary). Now, checkpoint publishing directly invokes aTransportReplicationAction
to perform the operation on the primary and replicas.PublishCheckpointAction
has been reworked to be the TransportReplicationAction implementation rather than an ActionType.We leverage dependency injection to create the checkpoint publisher (and its internal action) at
IndicesClusterStateService
. This is plumbed through toIndexShard
-IndexShard
creates the refresh listener instance. All other transport layer classes tied to the original broadcast action are no longer required.Unrelated integration tests use a no-op/empty checkpoint publisher to satisfy their constructor/method argument.
Signed-off-by: Kartik Ganesh [email protected]
Issues Resolved
closes #2199
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.