-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose all permits acquisition in IndexShard and TransportReplicationAction #35540
Conversation
Pinging @elastic/es-distributed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx @tlrx. I left initial feedback on the production call.
...r/src/main/java/org/elasticsearch/action/support/replication/TransportReplicationAction.java
Outdated
Show resolved
Hide resolved
@@ -697,7 +728,7 @@ public void onFailure(Exception e) { | |||
} | |||
} | |||
|
|||
protected IndexShard getIndexShard(ShardId shardId) { | |||
protected IndexShard getIndexShard(final ShardId shardId, final String targetAllocationID) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is the targetAllocaitonID passed here? I guess it should be validated, but it isn't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a leftover that was helpful in TransportReplicationAllPermitsAcquisitionTests
to dinstinguish between the primary and replica shard when executing a transport replication test action.
I reverted this and changed how the action in test retrieves the index shard.
* Acquire all primary operation permits. Once all permits are acquired, the provided ActionListener is called. | ||
* It is the responsibility of the caller to close the {@link Releasable}. | ||
*/ | ||
public void acquirePrimaryAllOperationsPermits(final ActionListener<Releasable> onPermitAcquired, final TimeValue timeout) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: acquireAllPrimaryOperationPermits
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
* an {@link IllegalStateException}. Otherwise the global checkpoint and the max_seq_no_of_updates marker of the replica are updated | ||
* before the invocation of the {@link ActionListener#onResponse(Object)}} method of the provided listener. | ||
*/ | ||
private ActionListener<Releasable> createListener(final ActionListener<Releasable> listener, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not so comfortable with separating this code from the one in updatePrimaryTermIfNeeded
- they are tightly connected. Instead of sharing code this way, how about creating a callback that will run:
indexShardOperationPermits.acquire(listener, executorOnDelay, true, debugInfo);
or
indexShardOperationPermits.asyncBlockOperations(listener, timeout.duration(), timeout.timeUnit());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I share your concerns and I updated the code, let me know what you think.
Thanks @bleskes, I addressed your first bunch of comments. I also added comments to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I left one suggestion for a potential follow up PR.
allPermitsAction.new AsyncPrimaryAction(request(), allocationId(), primaryTerm(), transportChannel(allPermitFuture), null) { | ||
@Override | ||
void runWithPrimaryShardReference(final TransportReplicationAction.PrimaryShardReference reference) { | ||
assertEquals("All permits must be acquired", 0, reference.indexShard.getActiveOperationsCount()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is indicates a funky property of getActiveOperationsCount
. To date all the blocks were internal to the shard and thus 0 was a good return. I don't really have a good solution in mind. At the very least we should document this on getActiveOperationsCount
. Maybe 1 (we know there is only one exclusive op) is a better return value? /cc @ywelsch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I take the point, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I missed the ping here. I think that 1 is ok too, but then we should in general change the terminology in IndexShardOperationPermits
to move from the notion of blocking operations to running exclusive operations (similar to a read/write lock).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed.
if (this.shardId.equals(shardId) == false) { | ||
throw new AssertionError("shard id differs from " + shardId); | ||
} | ||
return (executedOnPrimary.get() == null) ? primary : replica; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙈
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know, not really proud of this... I'll take a look at how to improve this test.
…cationAction This commit adds the acquirePrimaryAllOperationsPermits() and the acquireReplicaAllOperationsPermits() methods to the IndexShard class. These methods allow to acquire all operations permits on a primary or a replica shard and can be used in future transport replication actions to acquire all permits instead of a single one. Related to elastic elastic#33888
4865718
to
ef255a2
Compare
…Action (#35540) This pull request exposes two new methods in the IndexShard and TransportReplicationAction classes in order to allow transport replication actions to acquire all index shard operation permits for their execution. It first adds the acquireAllPrimaryOperationPermits() and the acquireAllReplicaOperationsPermits() methods to the IndexShard class which allow to acquire all operations permits on a shard while exposing a Releasable. It also refactors the TransportReplicationAction class to expose two protected methods (acquirePrimaryOperationPermit() and acquireReplicaOperationPermit()) that can be overridden when a transport replication action requires the acquisition of all permits on primary and/or replica shard during execution. Finally, it adds a TransportReplicationAllPermitsAcquisitionTests which illustrates how a transport replication action can grab all permits before adding a cluster block in the cluster state, making subsequent operations that requires a single permit to fail). Related to elastic #33888
…Action (#35540) This pull request exposes two new methods in the IndexShard and TransportReplicationAction classes in order to allow transport replication actions to acquire all index shard operation permits for their execution. It first adds the acquireAllPrimaryOperationPermits() and the acquireAllReplicaOperationsPermits() methods to the IndexShard class which allow to acquire all operations permits on a shard while exposing a Releasable. It also refactors the TransportReplicationAction class to expose two protected methods (acquirePrimaryOperationPermit() and acquireReplicaOperationPermit()) that can be overridden when a transport replication action requires the acquisition of all permits on primary and/or replica shard during execution. Finally, it adds a TransportReplicationAllPermitsAcquisitionTests which illustrates how a transport replication action can grab all permits before adding a cluster block in the cluster state, making subsequent operations that requires a single permit to fail). Related to elastic #33888
This pull request exposes two new methods in the
IndexShard
andTransportReplicationAction
classes in order to allow transport replication actions to acquire all index shard operation permits for their execution.It first adds the
acquirePrimaryAllOperationsPermits()
and theacquireReplicaAllOperationsPermits()
methods to theIndexShard
class which allow to acquire all operations permits on a shard while exposing aReleasable
. It also refactors theTransportReplicationAction
class to expose two protected methods (acquirePrimaryOperationPermit()
andacquireReplicaOperationPermit()
) that can be overridden when a transport replication action requires the acquisition of all permits on primary and/or replica shard during execution.Finally, it adds a
TransportReplicationAllPermitsAcquisitionTests
which illustrates how a transport replication action can grab all permits before adding a cluster block in the cluster state, making subsequent operations that requires a single permit to fail (such test has been discussed in #35332 (comment)).Related to elastic #33888