Child requests proactively cancel children tasks #92588

kingherc · 2022-12-28T13:55:12Z

To make this possible we modify the CancellableTasksTracker to track children tasks by the Request ID as well. That way, we can send an Action to cancel a child based on the parent task and the Request ID.

This is especially useful when parents' children requests timeout on the parents' side.

The motivation behind this PR lies behind fixing test failure #90353. In discussing the simple solution of PR #92520, we decided with @DaveCTurner that the best approach to solving the test failure would be to solve #66992. Unfortunately that issue may require substantial effort. But for the moment, we thought it would be easier to cancel children requests on timeout, since we already have infrastructure for tracking children tasks (through the CancellableTasksTracker).

Fixes #90353
Relates #66992

elasticsearchmachine · 2022-12-28T13:55:37Z

Hi @kingherc, I've created a changelog YAML for you.

To make this possible we modify the CancellableTasksTracker to track children tasks by the Request ID as well. That way, we can send an Action to cancel a child based on the parent task and the Request ID. This is especially useful when parents' children requests timeout on the parents' side. Fixes elastic#90353 Relates elastic#66992

This reverts commit 517790b.

elasticsearchmachine · 2022-12-29T10:22:45Z

Pinging @elastic/es-distributed (Team:Distributed)

kingherc · 2022-12-29T10:22:51Z

Hi @DaveCTurner , @original-brownbear this is ready for review now.

...st-kit/src/main/java/org/elasticsearch/repositories/blobstore/testkit/BlobAnalyzeAction.java

kingherc · 2023-01-09T14:39:45Z

Requesting @tlrx 's review since he may have more time to review this these days.

kingherc · 2023-01-16T14:18:08Z

Ping for reviews

fcofdez

This looks in the good direction, I left a few comments.

server/src/main/java/org/elasticsearch/transport/TransportRequest.java

...st-kit/src/main/java/org/elasticsearch/repositories/blobstore/testkit/BlobAnalyzeAction.java

fcofdez · 2023-01-27T09:01:37Z

...alClusterTest/java/org/elasticsearch/action/admin/cluster/node/tasks/CancellableTasksIT.java

+            for (DiscoveryNode node : nodes) {
+                TransportService transportService = internalCluster().getInstance(TransportService.class, node.getName());
+                for (ThreadPoolStats.Stats stat : transportService.getThreadPool().stats()) {
+                    assertEquals(0, stat.getActive());


This is a tricky assertion, something might get enqueued just after the assertion. But I think ensureBansAndCancellationsConsistency covers most of what we want to assert here.

This is necessary because it seems otherwise the test infrastructure may get stuck. If I remember correctly, it's probably because the test at the end then thinks that sometimes things are still running (servers received cancellation but did not make it to process it). That is why I wait for the thread pools to be empty. I do not think anything else can get enqueued afterwards, but if anything does I would expect it to be done by the end of the test procedure.

...alClusterTest/java/org/elasticsearch/action/admin/cluster/node/tasks/CancellableTasksIT.java

server/src/main/java/org/elasticsearch/tasks/CancellableTasksTracker.java

server/src/main/java/org/elasticsearch/tasks/TaskCancellationService.java

…el-child-on-timeout

kingherc

Hi, thanks for the comments @fcofdez ! See ameliorations. I invite you to re-review. @tlrx feel free to also take a look if you have time. Thanks!

server/src/main/java/org/elasticsearch/tasks/CancellableTasksTracker.java

...st-kit/src/main/java/org/elasticsearch/repositories/blobstore/testkit/BlobAnalyzeAction.java

kingherc · 2023-03-10T14:53:57Z

...alClusterTest/java/org/elasticsearch/action/admin/cluster/node/tasks/CancellableTasksIT.java

+            for (DiscoveryNode node : nodes) {
+                TransportService transportService = internalCluster().getInstance(TransportService.class, node.getName());
+                for (ThreadPoolStats.Stats stat : transportService.getThreadPool().stats()) {
+                    assertEquals(0, stat.getActive());


This is necessary because it seems otherwise the test infrastructure may get stuck. If I remember correctly, it's probably because the test at the end then thinks that sometimes things are still running (servers received cancellation but did not make it to process it). That is why I wait for the thread pools to be empty. I do not think anything else can get enqueued afterwards, but if anything does I would expect it to be done by the end of the test procedure.

fcofdez

LGTM, thanks for the clarifications 👍

…el-child-on-timeout

kingherc requested review from DaveCTurner and original-brownbear December 28, 2022 13:55

kingherc self-assigned this Dec 28, 2022

This was referenced Dec 28, 2022

Repo test kit to use more than 1 snapshot thread #92520

Closed

[CI] SnapshotRepoTestKitClientYamlTestSuiteIT test {p0=/10_analyze/Timeout with large blobs} failing #90353

Closed

kingherc changed the title ~~Failed tasks proactively cancel children tasks~~ Children requests proactively cancel children tasks Dec 28, 2022

kingherc changed the title ~~Children requests proactively cancel children tasks~~ Child requests proactively cancel children tasks Dec 28, 2022

kingherc force-pushed the enhancement/90353-66992-cancel-child-on-timeout branch from 075c129 to 5b04545 Compare December 28, 2022 15:04

kingherc added 6 commits December 28, 2022 17:48

Fix how blob analyze action is cancelled

d513466

Fix style

1b9e50f

Introduce version supported for new Action

b18622f

Add TaskCancellation to mock transport service

517790b

Revert "Add TaskCancellation to mock transport service"

8d48e74

This reverts commit 517790b.

Add TaskCancellation to TransportActionProxyTests

791c759

kingherc marked this pull request as ready for review December 29, 2022 10:22

kingherc commented Dec 30, 2022

View reviewed changes

...st-kit/src/main/java/org/elasticsearch/repositories/blobstore/testkit/BlobAnalyzeAction.java Show resolved Hide resolved

kingherc requested a review from tlrx January 9, 2023 14:39

fcofdez self-requested a review January 24, 2023 16:11

fcofdez reviewed Jan 27, 2023

View reviewed changes

rjernst added v8.8.0 and removed v8.7.0 labels Feb 8, 2023

kingherc added 3 commits March 7, 2023 19:23

Merge remote-tracking branch 'main' into enhancement/90353-66992-canc…

1b63809

…el-child-on-timeout

Fix transport version

45f5f5e

Fix PR comments

c9f685d

kingherc commented Mar 10, 2023

View reviewed changes

kingherc requested a review from fcofdez March 10, 2023 14:56

fcofdez approved these changes Mar 27, 2023

View reviewed changes

Merge remote-tracking branch 'main' into enhancement/90353-66992-canc…

522a410

…el-child-on-timeout

kingherc merged commit 400b7ec into elastic:main Apr 3, 2023

kingherc deleted the enhancement/90353-66992-cancel-child-on-timeout branch April 3, 2023 12:54

kingherc mentioned this pull request Apr 3, 2023

Cancel task (and descendants) if its originating transport request times out #66992

Open

romseygeek mentioned this pull request Apr 3, 2023

[CI] CancellableTasksIT testChildrenTasksCancelledOnTimeout failing #94989

Closed

kingherc mentioned this pull request Jun 7, 2024

Fix task cancellation on remote cluster when original request fails #109440

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Child requests proactively cancel children tasks #92588

Child requests proactively cancel children tasks #92588

kingherc commented Dec 28, 2022

elasticsearchmachine commented Dec 28, 2022

elasticsearchmachine commented Dec 29, 2022

kingherc commented Dec 29, 2022

kingherc commented Jan 9, 2023

kingherc commented Jan 16, 2023

fcofdez left a comment

fcofdez Jan 27, 2023

kingherc Mar 10, 2023

kingherc left a comment

kingherc Mar 10, 2023

fcofdez left a comment

Child requests proactively cancel children tasks #92588

Child requests proactively cancel children tasks #92588

Conversation

kingherc commented Dec 28, 2022

elasticsearchmachine commented Dec 28, 2022

elasticsearchmachine commented Dec 29, 2022

kingherc commented Dec 29, 2022

kingherc commented Jan 9, 2023

kingherc commented Jan 16, 2023

fcofdez left a comment

Choose a reason for hiding this comment

fcofdez Jan 27, 2023

Choose a reason for hiding this comment

kingherc Mar 10, 2023

Choose a reason for hiding this comment

kingherc left a comment

Choose a reason for hiding this comment

kingherc Mar 10, 2023

Choose a reason for hiding this comment

fcofdez left a comment

Choose a reason for hiding this comment