-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Child requests proactively cancel children tasks #92588
Child requests proactively cancel children tasks #92588
Conversation
Hi @kingherc, I've created a changelog YAML for you. |
To make this possible we modify the CancellableTasksTracker to track children tasks by the Request ID as well. That way, we can send an Action to cancel a child based on the parent task and the Request ID. This is especially useful when parents' children requests timeout on the parents' side. Fixes elastic#90353 Relates elastic#66992
075c129
to
5b04545
Compare
This reverts commit 517790b.
Pinging @elastic/es-distributed (Team:Distributed) |
Hi @DaveCTurner , @original-brownbear this is ready for review now. |
...st-kit/src/main/java/org/elasticsearch/repositories/blobstore/testkit/BlobAnalyzeAction.java
Show resolved
Hide resolved
Requesting @tlrx 's review since he may have more time to review this these days. |
Ping for reviews |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks in the good direction, I left a few comments.
server/src/main/java/org/elasticsearch/transport/TransportRequest.java
Outdated
Show resolved
Hide resolved
...st-kit/src/main/java/org/elasticsearch/repositories/blobstore/testkit/BlobAnalyzeAction.java
Show resolved
Hide resolved
for (DiscoveryNode node : nodes) { | ||
TransportService transportService = internalCluster().getInstance(TransportService.class, node.getName()); | ||
for (ThreadPoolStats.Stats stat : transportService.getThreadPool().stats()) { | ||
assertEquals(0, stat.getActive()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a tricky assertion, something might get enqueued just after the assertion. But I think ensureBansAndCancellationsConsistency
covers most of what we want to assert here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is necessary because it seems otherwise the test infrastructure may get stuck. If I remember correctly, it's probably because the test at the end then thinks that sometimes things are still running (servers received cancellation but did not make it to process it). That is why I wait for the thread pools to be empty. I do not think anything else can get enqueued afterwards, but if anything does I would expect it to be done by the end of the test procedure.
...alClusterTest/java/org/elasticsearch/action/admin/cluster/node/tasks/CancellableTasksIT.java
Outdated
Show resolved
Hide resolved
...alClusterTest/java/org/elasticsearch/action/admin/cluster/node/tasks/CancellableTasksIT.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/tasks/CancellableTasksTracker.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/tasks/TaskCancellationService.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
server/src/main/java/org/elasticsearch/tasks/CancellableTasksTracker.java
Show resolved
Hide resolved
...st-kit/src/main/java/org/elasticsearch/repositories/blobstore/testkit/BlobAnalyzeAction.java
Show resolved
Hide resolved
for (DiscoveryNode node : nodes) { | ||
TransportService transportService = internalCluster().getInstance(TransportService.class, node.getName()); | ||
for (ThreadPoolStats.Stats stat : transportService.getThreadPool().stats()) { | ||
assertEquals(0, stat.getActive()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is necessary because it seems otherwise the test infrastructure may get stuck. If I remember correctly, it's probably because the test at the end then thinks that sometimes things are still running (servers received cancellation but did not make it to process it). That is why I wait for the thread pools to be empty. I do not think anything else can get enqueued afterwards, but if anything does I would expect it to be done by the end of the test procedure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the clarifications 👍
…el-child-on-timeout
To make this possible we modify the CancellableTasksTracker to track children tasks by the Request ID as well. That way, we can send an Action to cancel a child based on the parent task and the Request ID.
This is especially useful when parents' children requests timeout on the parents' side.
The motivation behind this PR lies behind fixing test failure #90353. In discussing the simple solution of PR #92520, we decided with @DaveCTurner that the best approach to solving the test failure would be to solve #66992. Unfortunately that issue may require substantial effort. But for the moment, we thought it would be easier to cancel children requests on timeout, since we already have infrastructure for tracking children tasks (through the
CancellableTasksTracker
).Fixes #90353
Relates #66992