Make PrimaryReplicaResyncer Fork to Generic Pool #69949

original-brownbear · 2021-03-04T07:13:53Z

Reading ops from the translog snapshot must not run on the transport thread.
When sending more than one batch of ops the listener (and thus run) would be
invoked on the transport thread for all but the first batch of ops.
=> Forking to the generic pool like we do for sending ops during recovery.

Reading ops from the translog snapshot must not run on the transport thread. When sending more than one batch of ops the listener (and thus `run`) would be invoked on the transport thread for all but the first batch of ops. => Forking to the generic pool like we do for sending ops during recovery.

elasticmachine · 2021-03-04T07:13:56Z

Pinging @elastic/es-distributed (Team:Distributed)

DaveCTurner

Thanks Armin, one question about cleanup

DaveCTurner · 2021-03-04T08:39:02Z

...ework/src/main/java/org/elasticsearch/index/replication/ESIndexLevelReplicationTestCase.java

@@ -159,7 +160,8 @@ protected DiscoveryNode getDiscoveryNode(String id) {
        private volatile ReplicationTargets replicationTargets;

        private final PrimaryReplicaSyncer primaryReplicaSyncer = new PrimaryReplicaSyncer(
-            new TaskManager(Settings.EMPTY, threadPool, Collections.emptySet()),
+            new MockTransport().createTransportService(Settings.EMPTY, threadPool,


Do we need to shut this (at least the threadpool) down cleanly?

It's just using the fake transport so I don;t think the service needs any shutting down here. The threadPool is torn down via the parent IndexShardTestCase -> I think we're good?

Oh right we already had a threadpool we're just using the transport service as a wrapper. Nvm.

DaveCTurner

LGTM

DaveCTurner · 2021-03-04T08:50:08Z

...ework/src/main/java/org/elasticsearch/index/replication/ESIndexLevelReplicationTestCase.java

@@ -159,7 +160,8 @@ protected DiscoveryNode getDiscoveryNode(String id) {
        private volatile ReplicationTargets replicationTargets;

        private final PrimaryReplicaSyncer primaryReplicaSyncer = new PrimaryReplicaSyncer(
-            new TaskManager(Settings.EMPTY, threadPool, Collections.emptySet()),
+            new MockTransport().createTransportService(Settings.EMPTY, threadPool,


Oh right we already had a threadpool we're just using the transport service as a wrapper. Nvm.

original-brownbear · 2021-03-04T08:51:11Z

Thanks David!

Reading ops from the translog snapshot must not run on the transport thread. When sending more than one batch of ops the listener (and thus `run`) would be invoked on the transport thread for all but the first batch of ops. => Forking to the generic pool like we do for sending ops during recovery.

We assert that the snapshot isn't closed on a transport thread, but we close it without forking off the transport thread in case of a failure. With this commit we fork on failure too. Relates elastic#69949 Closes elastic#70407

We assert that the snapshot isn't closed on a transport thread, but we close it without forking off the transport thread in case of a failure. With this commit we fork on failure too. Relates #69949 Closes #70407

We can have a race here where the closed check passes and then we concurrently to a shard close try to fail the shard also. Previously this was covered by the catch below the changed code that would just ignore the already-closed exception but with elastic#69949 we're now forking to the generic pool for this logic and thus have to handle the exception in the callback as well.

We can have a race here where the closed check passes and then we concurrently to a shard close try to fail the shard also. Previously this was covered by the catch below the changed code that would just ignore the already-closed exception but with #69949 we're now forking to the generic pool for this logic and thus have to handle the exception in the callback as well.

We can have a race here where the closed check passes and then we concurrently to a shard close try to fail the shard also. Previously this was covered by the catch below the changed code that would just ignore the already-closed exception but with elastic#69949 we're now forking to the generic pool for this logic and thus have to handle the exception in the callback as well.

We can have a race here where the closed check passes and then we concurrently to a shard close try to fail the shard also. Previously this was covered by the catch below the changed code that would just ignore the already-closed exception but with #69949 we're now forking to the generic pool for this logic and thus have to handle the exception in the callback as well.

original-brownbear added >bug :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. v8.0.0 v7.12.0 v7.13.0 labels Mar 4, 2021

elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Mar 4, 2021

original-brownbear requested a review from DaveCTurner March 4, 2021 08:04

DaveCTurner reviewed Mar 4, 2021

View reviewed changes

original-brownbear requested a review from DaveCTurner March 4, 2021 08:43

DaveCTurner approved these changes Mar 4, 2021

View reviewed changes

original-brownbear merged commit e622b2c into elastic:master Mar 4, 2021

original-brownbear deleted the repro-resyc-troubles branch March 4, 2021 08:51

original-brownbear mentioned this pull request Mar 4, 2021

Make PrimaryReplicaResyncer Fork to Generic Pool (#69949) #69952

Merged

original-brownbear mentioned this pull request Mar 4, 2021

Make PrimaryReplicaResyncer Fork to Generic Pool (#69949) #69953

Merged

DaveCTurner mentioned this pull request Mar 17, 2021

Fork listener#onFailure in PrimaryReplicaSyncer #70506

Merged

original-brownbear mentioned this pull request Mar 30, 2021

Fix Tripped Assertion on Resync Failure during Node Shutdown #71062

Merged

original-brownbear mentioned this pull request Mar 31, 2021

Fix Tripped Assertion on Resync during Node Shutdown (#71062) #71100

Merged

original-brownbear mentioned this pull request Mar 31, 2021

Fix Tripped Assertion on Resync during Node Shutdown (#71062) #71101

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

original-brownbear restored the repro-resyc-troubles branch April 18, 2023 21:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make PrimaryReplicaResyncer Fork to Generic Pool #69949

Make PrimaryReplicaResyncer Fork to Generic Pool #69949

original-brownbear commented Mar 4, 2021

elasticmachine commented Mar 4, 2021

DaveCTurner left a comment

DaveCTurner Mar 4, 2021

original-brownbear Mar 4, 2021

DaveCTurner Mar 4, 2021

DaveCTurner left a comment

DaveCTurner Mar 4, 2021

original-brownbear commented Mar 4, 2021

Make PrimaryReplicaResyncer Fork to Generic Pool #69949

Make PrimaryReplicaResyncer Fork to Generic Pool #69949

Conversation

original-brownbear commented Mar 4, 2021

elasticmachine commented Mar 4, 2021

DaveCTurner left a comment

Choose a reason for hiding this comment

DaveCTurner Mar 4, 2021

Choose a reason for hiding this comment

original-brownbear Mar 4, 2021

Choose a reason for hiding this comment

DaveCTurner Mar 4, 2021

Choose a reason for hiding this comment

DaveCTurner left a comment

Choose a reason for hiding this comment

DaveCTurner Mar 4, 2021

Choose a reason for hiding this comment

original-brownbear commented Mar 4, 2021