Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

async search returns is_partial:false when local cluster has shard errors #98725

Open
nreese opened this issue Aug 22, 2023 · 2 comments · May be fixed by #98913
Open

async search returns is_partial:false when local cluster has shard errors #98725

nreese opened this issue Aug 22, 2023 · 2 comments · May be fixed by #98913
Assignees
Labels
>bug priority:normal A label for assessing bug priority to be used by ES engineers :Search Foundations/Search Catch all for Search Foundations Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch

Comments

@nreese
Copy link
Contributor

nreese commented Aug 22, 2023

Elasticsearch Version

main

Installed Plugins

none

Java Version

bundled

OS Version

21.6.0 Darwin Kernel Version 21.6.0

Problem Description

async search returns is_partial:false when local cluster has shard errors

Steps to Reproduce

PUT local1
{}

PUT local1/_mapping
{
  "properties": {
    "value": {
      "type": "keyword"
    }
  }
}

PUT local1/_doc/1
{
    "value" : "foo1"
}

PUT local2
{}

PUT local2/_mapping
{
  "properties": {
    "value": {
      "type": "keyword"
    }
  }
}

PUT local2/_doc/1
{
    "value" : "foo2"
}

POST /local*/_async_search
{
  "query": {
    "error_query": {
      "indices": [
        {
          "error_type": "exception",
          "message": "local shard failure message 123",
          "name": "local2"
        }
      ]
    }
  }
}

In the response, notice how is_partial is false. However, results are partial because local2 shard failed to return results and only a single hit is returned. Complete results should include 2 hits

{
  "is_partial": false,
  "is_running": false,
  "start_time_in_millis": 1692708012844,
  "expiration_time_in_millis": 1693140012844,
  "completion_time_in_millis": 1692708012845,
  "response": {
    "took": 1,
    "timed_out": false,
    "_shards": {
      "total": 2,
      "successful": 1,
      "skipped": 0,
      "failed": 1,
      "failures": [
        {
          "shard": 0,
          "index": "local2",
          "node": "XPQtz28bSLW26Oynf6oUmg",
          "reason": {
            "type": "query_shard_exception",
            "reason": "failed to create query: [local2][0] local shard failure message 123",
            "index_uuid": "-MsFEeVHRU6e20nK5cRwXw",
            "index": "local2",
            "caused_by": {
              "type": "runtime_exception",
              "reason": "[local2][0] local shard failure message 123"
            }
          }
        }
      ]
    },
    "hits": {
      "total": {
        "value": 1,
        "relation": "eq"
      },
      "max_score": 1,
      "hits": [
        {
          "_index": "local1",
          "_id": "1",
          "_score": 1,
          "_source": {
            "value": "foo1"
          }
        }
      ]
    }
  }
}

Logs (if relevant)

No response

@nreese nreese added >bug needs:triage Requires assignment of a team area label labels Aug 22, 2023
@gwbrown gwbrown removed the needs:triage Requires assignment of a team area label label Aug 22, 2023
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Aug 22, 2023
@gwbrown gwbrown added :Search/Search Search-related issues that do not fall into other categories and removed needs:triage Requires assignment of a team area label labels Aug 22, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Aug 22, 2023
@quux00 quux00 self-assigned this Aug 26, 2023
quux00 added a commit to quux00/elasticsearch that referenced this issue Aug 31, 2023
With the recent addition of per-cluster metadata to the `_clusters` section of the response
for cross-cluster searches (see elastic#97731), the `is_partial` setting in the async-search response,
now acts as a useful summary to end-users that search/aggs data from all shards is potentially incomplete
(not all shards fully searched), which could be for one of 3 reasons:

1. at least one shard was not successfully searched (a PARTIAL search cluster state)
2. at least one cluster (marked as `skip_unavailable`=`true`) was unavailable (or all
   searches on all shards of that cluster failed), causing the cluster to be marked as SKIPPED
3. a search on at least one cluster timed out (`timed_out`=`true`, resulting in a PARTIAL cluster search status)

This commit changes local-only (non-CCS) searches to behave consistently with cross-cluster searches,
namely, if any search on any shard fails or if the search times out, the is_partial flag is set to true.

Closes elastic#98725
nreese added a commit to elastic/kibana that referenced this issue Oct 2, 2023
Closes #164893

### Background

"is partial" has 2 meanings
1) Results are incomplete because search is still running
2) Search is finished. Results are incomplete because there are shard
failures (either in local or remote clusters)

[async
search](https://www.elastic.co/guide/en/elasticsearch/reference/current/async-search.html)
defines 2 flags.
1) `is_running`: Whether the search is still being executed or it has
completed
2) `is_partial`: When the query is no longer running, indicates whether
the search failed or was successfully completed on all shards. While the
query is being executed, is_partial is always set to true
**note**: there is a bug in async search where requests to only local
clusters return `is_partial:false` when there are shard errors on the
local cluster. See
elastic/elasticsearch#98725. This should be
resolved in 8.11

Kibana's existing search implementation does not align with
Elasticsearch's `is_running` and `is_partial` flags. Kibana defines "is
partial" as definition "1)". Elasticsearch async search defines "is
partial" as definition "2)".

This PR aligns Kibana's usage of "is partial" with Elasticsearch's
definition. This required the following changes
1) `isErrorResponse` renamed to `isAbortedResponse`. Method no longer
returns true when `!response.isRunning && !!response.isPartial`. Kibana
handles results with incomplete data. **Note** removed export of
`isErrorResponse` from data plugin since its use outside of data plugin
does not make sense.
2) Replace `isPartialResponse` with `isRunningResponse`. This aligns
Kibana's definition with Elasticsearch async search flags.
3) Remove `isCompleteResponse`. The word "complete" is ambiguous. Does
it mean the search is finished (no longer running)? Or does it mean the
search has all results and there are no shard failures?

---------

Co-authored-by: kibanamachine <[email protected]>
Co-authored-by: Jatin Kathuria <[email protected]>
Co-authored-by: Patryk Kopyciński <[email protected]>
@benwtrent benwtrent added the priority:normal A label for assessing bug priority to be used by ES engineers label Jul 9, 2024
@javanna javanna added :Search Foundations/Search Catch all for Search Foundations and removed :Search/Search Search-related issues that do not fall into other categories labels Jul 17, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

@elasticsearchmachine elasticsearchmachine added Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug priority:normal A label for assessing bug priority to be used by ES engineers :Search Foundations/Search Catch all for Search Foundations Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants