Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async search: clarify when the response is final #55572

Closed
Tracked by #88658
javanna opened this issue Apr 22, 2020 · 7 comments
Closed
Tracked by #88658

Async search: clarify when the response is final #55572

javanna opened this issue Apr 22, 2020 · 7 comments
Labels
>enhancement :Search Foundations/Search Catch all for Search Foundations Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch

Comments

@javanna
Copy link
Member

javanna commented Apr 22, 2020

Submit async search and get async search both return a search response that may or may not be the final one. The response includes two flags that indicate the state of the async search: is_running and is_partial.

{
  "id" : "FmRldE8zREVEUzA2ZVpUeGs2ejJFUFEaMkZ5QTVrSTZSaVN3WlNFVmtlWHJsdzoxMDc=",
  "is_partial" : true, 
  "is_running" : true, 
  "start_time_in_millis" : 1583945890986,
  "expiration_time_in_millis" : 1584377890986, 
  "response" : {
    ...
  }
}

When is_running is set to true, the query is still running hence more results are expected to be included in later results. In this scenario is_partial would also be set to true to indicate that the results come from a subset of the shards that the query is expected to hit.

When is_running is false, the query has stopped, which may happen due to multiple reasons:

  • the search has successfully terminated, in which case is_partial is also set to false
  • the search failed, in which case is_partial is set to true to indicate that any results that may be included in the search response come only from a subset of the shards that the query should have hit.
  • the node that was coordinating the search died (to be improved by Async search: store intermediate results #55464)

Having two flags and having to worry about these different scenarios is not user-friendly. It would be nice to be able to summarize this in a single flag. Could we remove is_partial and make users rely on is_running alone? In that case, they would have to inspect the search response to see whether the search has failed or not, and how many shards the query ran against, basically the ordinary things that we recommend users to check in a search response?

@javanna javanna added the :Search/Search Search-related issues that do not fall into other categories label Apr 22, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Search)

@javanna
Copy link
Member Author

javanna commented Apr 25, 2020

@jimczi we discussed that storing partial responses should also help removing is_partial, but I wonder if the two changes are so tightly related. I believe that the answer boils down to this question: "is there a situation today where the _shards section is not enough to explain the state of a search that is no longer running?". I am referring specifically to the scenario where the query is no longer running, because when the query is running is_partial is always true hence surely it does not add value.

The conclusion that I came to is that the _shards section should be self-explanatory and if it's not we have a bug in how we count the shards in async search. What worries me a little is the case of fatal failures, because we return a _shards section where ordinary search wouldn't and it's hard to express that some shards did return some partial results, though the search has later entirely failed. I believe in that case successful should be 0 (or better yet should be the same as skipped given they are counted as successful), which I proposed as part of #55758, but then we lose information around how many shards the partial results come from. If we want to keep counting successful shards based on how many provided partial results, I am afraid that we need to have some extra flag that says whether there was a fatal failure, though it should differ from what is_partial does today. Let's discuss this further :)

javanna added a commit that referenced this issue Apr 29, 2020
javanna added a commit that referenced this issue Apr 29, 2020
javanna added a commit that referenced this issue Apr 29, 2020
@rjernst rjernst added the Team:Search Meta label for search team label May 4, 2020
@mayya-sharipova
Copy link
Contributor

mayya-sharipova commented May 24, 2022

We also got the following case, where the following async search response is returned with {is_partial: true, is_running:false, completion_status: 200}

{
  "id" : "<id>",
  "is_partial" : true,
  "is_running" : false,
  "start_time_in_millis" : 1652752048673,
  "expiration_time_in_millis" : 1653356848673,
  "response" : {
    "took" : 208,
    "timed_out" : false,
    "terminated_early" : false,
    "num_reduce_phases" : 8,
    "_shards" : {
      "total" : 1875,
      "successful" : 775,
      "skipped" : 270,
      "failed" : 0
    },
    "_clusters" : {
      "total" : 3,
      "successful" : 3,
      "skipped" : 0
    },
    "hits" : {
      "total" : {
        "value" : 516616,
        "relation" : "eq"
      },
      "max_score" : null,
      "hits" : [ ]
    },
    "aggregations" : {
      "2" : {
        "buckets" : [

We think this happens because the initial search response was successfully saved in the .async-search index, but after the coordinating node failed its corresponding async search task also died. Then when we tried to retrieve the final response, it was getting this partially completed saved response from the index.


So this scenario indicates a failure, so I am wondering if in this case we should return a different status code (>=400) instead of 200?

Another option is to consider is the setting for allow_partial_search_results. If some shards are missing, and this setting is true (default value), we return OK - 200; if this settings is false, we return Error status code (>=400).

@quux00
Copy link
Contributor

quux00 commented Aug 27, 2023

Recently, Kibana filed a bug against Elasticsearch async-search for local-only (non-CCS) searches where is_partial=false when a subset of shards fail in the search. Only when a search fails entirely (all shard searches fail) is is_partial set to true.

I created a bug fix PR to address it (#98839), but in doing so realized this behavior was intentionally added (example) and is_partial is always to set false if the search finishes successfully.

Unfortunately, this is NOT how I recently implemented the logic for cross-cluster search responses. There, if any search on any shard fails or if any search times out, I mark the async-search response as partial, even if the search is "successful" (returns 2xx HTTP status). So now there is an inconsistency between CCS and local-only searches wrt to this field.

In this current GH issue, Luca asked:

I believe that the answer boils down to this question: "is there a situation today where the _shards section is not enough to explain the state of a search that is no longer running?"

With the addition of per-cluster metadata now added to the _clusters section of the response for cross-cluster searches, is_partial now acts as a useful summary to end-users that search/aggs data from all shards is potentially incomplete (not all shards fully searched), which could be for one of 3 reasons:

  1. at least one shard was not successfully searched (a PARTIAL search cluster state in CCS)
  2. at least one cluster (marked as skip_unavailable=true) was unavailable (or all searches on all shards of that cluster failed), causing the cluster to be marked as SKIPPED
  3. a search on at least one cluster timed out (timed_out=true, resulting in a PARTIAL cluster search status)

Since there are many possible causes for partial data, I think it makes sense to retain the is_partial flag, so the user does not have to parse several different sections of the response metadata to determine whether the data is complete.

If we keep it, then we have to agree when it should be set to true, since there is now a discrepancy between cross-cluster and local-only searches.

IMO, the current behavior of local-only searches where is_partial = true only if the search fails is not useful and confusing (thus the bug filed by Kibana mentioned above). If one shard failed, but the rest succeeded, the search will be marked as successful (2xx HTTP status), but the returned data is obviously incomplete, so is_partial=true usefully indicates that at a glance to the caller.

I propose changing the behavior to work as I have in PR #98913.

The one issue I'm not fully clear on, is that the code clearly discusses that we want to ignore fetch failures for async-searches (see AsyncSearchTask.Listener#onFetchFailure), but I believe my proposal still works because in the case of fetch failures we do not count it as a failed shard. The is_partial flag uses that failed shard count as one of its inputs, so this should be OK.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@javanna javanna added :Search Foundations/Search Catch all for Search Foundations and removed :Search/Search Search-related issues that do not fall into other categories team-discuss labels Jul 17, 2024
@elasticsearchmachine elasticsearchmachine added the Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch label Jul 17, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

@elasticsearchmachine elasticsearchmachine removed the Team:Search Meta label for search team label Jul 17, 2024
@javanna
Copy link
Member Author

javanna commented Jan 23, 2025

This has been open for quite a while, and hasn't had a lot of interest. For now I'm going to close this as something we aren't planning on implementing. We can re-open it later if needed.

@javanna javanna closed this as not planned Won't fix, can't repro, duplicate, stale Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search Foundations/Search Catch all for Search Foundations Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch
Projects
None yet
Development

No branches or pull requests

7 participants