Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cancel task (and descendants) if its originating transport request times out #66992

Open
DaveCTurner opened this issue Jan 5, 2021 · 5 comments
Labels
:Distributed Coordination/Task Management Issues for anything around the Tasks API - both persistent and node level. >enhancement Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.

Comments

@DaveCTurner
Copy link
Contributor

If a sender sets a timeout on a transport request and does not receive a response in time then today we make no attempt to inform the receiver that we no longer care about its response. This is particularly bad for stats requests that may be timing out on one broken node, but still continue to pile up there since that node has no way to know that these requests are now irrelevant and should not be processed.

A couple of possible solutions spring to mind:

  • When the sender times out it could sends a task cancellation request to the receiver.

  • The sender could indicate the timeout to the receiver, which could then implement its own local timeout-and-cancel behaviour.

Relates #60188, #52616, #51992.

@DaveCTurner DaveCTurner added >enhancement :Distributed Coordination/Task Management Issues for anything around the Tasks API - both persistent and node level. labels Jan 5, 2021
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jan 5, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@Bukhtawar
Copy link
Contributor

Thanks @DaveCTurner, Is it a prioritised item already, if not I'd be interested to work on this

@DaveCTurner
Copy link
Contributor Author

We're still contemplating how/whether to proceed on this. We may take no action since this only really applies to internally-generated requests which are mostly fairly well-behaved already. @Bukhtawar perhaps you'd be interested in resolving #55550 instead? Supporting cancellation on stats APIs would address the concerns you raised in #52616 for instance.

@Bukhtawar
Copy link
Contributor

Sure @DaveCTurner I'll take up support for cancellation on stats API. I'll get back on this

kingherc added a commit to kingherc/elasticsearch that referenced this issue Dec 28, 2022
To make this possible we modify the CancellableTasksTracker
to track children tasks by the Request ID as well. That
way, we can send an Action to cancel a child based on the
parent task and the Request ID.

This is especially useful when parents' children requests
timeout.

Fixes elastic#90353
Relates elastic#66992
kingherc added a commit to kingherc/elasticsearch that referenced this issue Dec 28, 2022
To make this possible we modify the CancellableTasksTracker
to track children tasks by the Request ID as well. That
way, we can send an Action to cancel a child based on the
parent task and the Request ID.

This is especially useful when parents' children requests
timeout on the parents' side.

Fixes elastic#90353
Relates elastic#66992
kingherc added a commit that referenced this issue Apr 3, 2023
To make this possible we modify the CancellableTasksTracker to track children tasks by the Request ID as well. That way, we can send an Action to cancel a child based on the parent task and the Request ID.

This is especially useful when parents' children requests timeout on the parents' side.

Fixes #90353
Relates #66992
@kingherc
Copy link
Contributor

kingherc commented Apr 3, 2023

#92588 is merged which will allow parents to send a cancel action to remote children upon a parent's failure (such as a time-out in the most usual case).

I think this ticket is a bit larger than that, since it is written in a way that is not constrained only to tasks with a parent. But maybe the PR about children tasks covers a large portion of this ticket.

DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Oct 21, 2023
Replaces the transport-level timeout with an overall timeout on the
whole repository analysis task to ensure that all child tasks terminate
promptly.

Relates elastic#66992
Closes elastic#101182
elasticsearchmachine pushed a commit that referenced this issue Oct 23, 2023
Replaces the transport-level timeout with an overall timeout on the
whole repository analysis task to ensure that all child tasks terminate
promptly.

Relates #66992 Closes #101182
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Oct 23, 2023
Replaces the transport-level timeout with an overall timeout on the
whole repository analysis task to ensure that all child tasks terminate
promptly.

Relates elastic#66992
Closes elastic#101182
elasticsearchmachine pushed a commit that referenced this issue Oct 23, 2023
Replaces the transport-level timeout with an overall timeout on the
whole repository analysis task to ensure that all child tasks terminate
promptly.

Relates #66992
Closes #101182
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Task Management Issues for anything around the Tasks API - both persistent and node level. >enhancement Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
Projects
None yet
Development

No branches or pull requests

4 participants