-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Async search: clarify when the response is final #55572
Comments
Pinging @elastic/es-search (:Search/Search) |
@jimczi we discussed that storing partial responses should also help removing The conclusion that I came to is that the |
We also got the following case, where the following async search response is returned with {
"id" : "<id>",
"is_partial" : true,
"is_running" : false,
"start_time_in_millis" : 1652752048673,
"expiration_time_in_millis" : 1653356848673,
"response" : {
"took" : 208,
"timed_out" : false,
"terminated_early" : false,
"num_reduce_phases" : 8,
"_shards" : {
"total" : 1875,
"successful" : 775,
"skipped" : 270,
"failed" : 0
},
"_clusters" : {
"total" : 3,
"successful" : 3,
"skipped" : 0
},
"hits" : {
"total" : {
"value" : 516616,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"2" : {
"buckets" : [ We think this happens because the initial search response was successfully saved in the .async-search index, but after the coordinating node failed its corresponding async search task also died. Then when we tried to retrieve the final response, it was getting this partially completed saved response from the index. So this scenario indicates a failure, so I am wondering if in this case we should return a different status code (>=400) instead of Another option is to consider is the setting for |
Recently, Kibana filed a bug against Elasticsearch async-search for local-only (non-CCS) searches where I created a bug fix PR to address it (#98839), but in doing so realized this behavior was intentionally added (example) and is_partial is always to set false if the search finishes successfully. Unfortunately, this is NOT how I recently implemented the logic for cross-cluster search responses. There, if any search on any shard fails or if any search times out, I mark the async-search response as partial, even if the search is "successful" (returns 2xx HTTP status). So now there is an inconsistency between CCS and local-only searches wrt to this field. In this current GH issue, Luca asked:
With the addition of per-cluster metadata now added to the
Since there are many possible causes for partial data, I think it makes sense to retain the If we keep it, then we have to agree when it should be set to true, since there is now a discrepancy between cross-cluster and local-only searches. IMO, the current behavior of local-only searches where is_partial = true only if the search fails is not useful and confusing (thus the bug filed by Kibana mentioned above). If one shard failed, but the rest succeeded, the search will be marked as successful (2xx HTTP status), but the returned data is obviously incomplete, so I propose changing the behavior to work as I have in PR #98913. The one issue I'm not fully clear on, is that the code clearly discusses that we want to ignore fetch failures for async-searches (see AsyncSearchTask.Listener#onFetchFailure), but I believe my proposal still works because in the case of fetch failures we do not count it as a failed shard. The is_partial flag uses that failed shard count as one of its inputs, so this should be OK. |
Pinging @elastic/es-search (Team:Search) |
Pinging @elastic/es-search-foundations (Team:Search Foundations) |
This has been open for quite a while, and hasn't had a lot of interest. For now I'm going to close this as something we aren't planning on implementing. We can re-open it later if needed. |
Submit async search and get async search both return a search response that may or may not be the final one. The response includes two flags that indicate the state of the async search:
is_running
andis_partial
.When
is_running
is set totrue
, the query is still running hence more results are expected to be included in later results. In this scenariois_partial
would also be set totrue
to indicate that the results come from a subset of the shards that the query is expected to hit.When
is_running
isfalse
, the query has stopped, which may happen due to multiple reasons:is_partial
is also set tofalse
is_partial
is set totrue
to indicate that any results that may be included in the search response come only from a subset of the shards that the query should have hit.Having two flags and having to worry about these different scenarios is not user-friendly. It would be nice to be able to summarize this in a single flag. Could we remove
is_partial
and make users rely onis_running
alone? In that case, they would have to inspect the search response to see whether the search has failed or not, and how many shards the query ran against, basically the ordinary things that we recommend users to check in a search response?The text was updated successfully, but these errors were encountered: