-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: add search request queue flush API #67129
Comments
Pinging @elastic/es-search (Team:Search) |
@etki Thank you for submitting a feature request. I can see several problem with flushing the search queue on a particular node. It will cause a flurry of errors as a coordination node is waiting for responses from these nodes. Also even if we moved requests for system indices to a dedicated thread pool, there may be some requests from other system applications (kibana, monitoring etc) that go through the search thread pool which we don't want to kill. |
We've discussed this within the team, and would like to know more about your setup.
|
Sorry for late reply. We're on 6.7 as for now, so behavior I've seen may be way outdated.
I see, didn't think that it may cause much work outside of functionality. Following is a question for my curiosity only: do I understand it right that there is no possibility to cancel search from coordinator node? I've heard that actually executing search task is uninterruptible and has to complete.
I've meant client application there. IIRC in very case it was a retry storm from client application generating excessive load, while application receiving those requests was passing them down to ES successfully (until queue was filled and rejections have started). I don't remember whether we've preventively restarted application talking to ES then, so HTTP channels could be still opened.
Yes, but in reality developers usually face worst-case scenarios with heavy queries (e.g. with lots of nested documents) or just abnormal and unexpected loads.
Do I get it right that
Is functionally the same that I've proposed? To be clear, I'm not familiar with cancel API and thought it relates only to maintenance tasks before. |
@etki Please find the answers for your below.
No, that's not correct. One thing to note though that a request first is put into the queue of the search thread poll. When it is its turn to be dequeued and be processed, only after that we create a task for it or check for cancellation.
Yes, cancel API is designed to cancel tasks including search tasks.
We have done some changes, including automatic task cancellation from v7.4 when the connection gets close. As there are ways to cancel search tasks, I am closing this issue. |
Sometimes we have following situation in production: something breaks, we receive a retry storm, ES receives ton of requests, starts queueing them, then rejecting (and i think that's a common situation). Even if load is completely disabled ES takes some time to process all those queries, which are usually completely irrelevant at that moment; that can increase incident response time (i.e. if cause was fixed faster than queues were fully processed). My suggestion is to add new
POST _cluster/???
endpoint, which would tell all nodes to flush everything they have in queues.The text was updated successfully, but these errors were encountered: