-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support timeout based search request cancellation #817
Comments
I guess we should also have a mechanism to support partial results we have gotten before the timeout unless |
@Bukhtawar: Cancellation mechanism will be helpful to terminate the workload in events when it is of no use to the client and it wants to stop the workload from consuming any more resources ASAP. If client submits a cancellation request for a task externally using task API then also the search request is failed in fetch phase irrespective of To provide more context, the allow_partial_search_results flag is only honored in query phase after getting results (docId) from all or subset of the shards. If the search request is cancelled before fetch phase then even send of fetch request from coordinator to shards will be failed with task cancellation exception. |
I was looking into a similar task recently, I support this requirement 100%. What do we need to get started on this? @sohami are you in the process of writing the code for this yet? |
@AmiStrn - yes. I was waiting on some initial feedback. Will raise the PR as soon as I am done |
Gladly:) |
@sohami I don't think size 0 aggregations have a fetch phase. Does partial results make sense here I guess it should work out of the box since we plan on simply cancelling |
@Bukhtawar - Yes in cases when there is no fetch phase and some of the shards successfully completed the execution before cancellation, then partial results will be returned if |
@dblock how do we get assignee's from the maintainers to review and approve PR's such as #986 ? (I had commented on the PR, but have not approved since I'm not a maintainer) edit: didn't notice that there is quite some lag with the PRs :) Please let us know what to expect in terms of a timeline on this. |
Can't promise an SLA rn, but we do also have some automation that is nagging maintainers that PRs are open for longer than we would like without action. For now, if you feel that no traction has been had on an issue, feel free to tag me and I'll go find someone to review. |
While you're not a maintainer you did a solid review of that PR, so nobody (including myself) felt the need to jump in. Thank you. I will click buttons after the next iteration to get more of the tests to run, and take a closer look at the code as well if needed. |
@sohami @AmiStrn we also have the same need: OS should returns partial results after the timeout in coordinate node. The ideal situation is that the coordinate node returns partial results(if allow_partial_search_results is true) and send cancel request when timeout. If possible, I would like to try to implement it. |
…pensearch-project#817) * Bump org.owasp.dependencycheck from 9.0.8 to 9.0.9 in /java-client Bumps org.owasp.dependencycheck from 9.0.8 to 9.0.9. --- updated-dependencies: - dependency-name: org.owasp.dependencycheck dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> * Update changelog Signed-off-by: dependabot[bot] <[email protected]> --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: dependabot[bot] <dependabot[bot]@users.noreply.github.com>
@sohami, If you are not working on it, I will try to implement it? |
@Bukhtawar @AmiStrn @sohami, please have a look when you are free. #16681 |
Is your feature request related to a problem? Please describe.
Currently the optional "timeout" parameter in the SearchRequest applies to the individual child shard level search requests not at the parent search request. The shard request to a node is sent in multiple batches based on "maxConcurrentRequestsPerNode" parameter. So if a search request results in sending N such batches, the parent request timeout will essentially be N*batchNumber. Also the timeout is only honored in query phase and not in Fetch phase. If there is a long running search for which client doesn't want to wait for the result anymore, they have to use the task API to cancel such request. In cases, when user doesn't initiate the cancellation the previous search will still be consuming the cluster resources until it completes.
Describe the solution you'd like
The proposal is to have a separate parameter in search request like "cancel_after_timeinterval" which can be set by the user both at request level and at cluster level. Based on this new parameter, after the timeout expiry the search requests will be cancelled automatically using the cancellation framework. This will help: 1) to reduce the wasted resource usage. 2) automatic cancellation mechanism for the search request, where client doesn't have to explicitly use the task API to cancel it.
Describe alternatives you've considered
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: