Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cross cluster search causes errors when is_partial: true in the response #166528

Closed
thomasneirynck opened this issue Sep 14, 2023 · 6 comments
Closed
Labels
bug Fixes for quality problems that affect the customer experience Feature:Search Querying infrastructure in Kibana impact:critical This issue should be addressed immediately due to a critical level of impact on the product. regression Team:DataDiscovery Discover, search (e.g. data plugin and KQL), data views, saved searches. For ES|QL, use Team:ES|QL. v8.10.0

Comments

@thomasneirynck
Copy link
Contributor

thomasneirynck commented Sep 14, 2023

Kibana version: 8.10.0

Elasticsearch version: 8.10.0

Steps to reproduce:

  1. Spin up an ES instance (this will be the remote)
  2. Ingest the following documents:
POST tmp-00001/_doc
{
  "foo": "bar",
  "bar": 1
}

POST tmp-00002/_doc
{
  "foo": "baz"
}
  1. Spin up a separate Kibana/ES instance
  2. Under Stack Management → Remote Clusters, add the cluster from step 1
  3. Create a Data View that corresponds with the remote indices (*:tmp-*)
  4. Go to Discover and select the data view (should work fine)
  5. Add a filter that uses a script that causes errors in one of the remote shards (using "Edit as Query DSL"):
{"script": {"script": "doc['bar'].value < 10"}}

You will get a "Received partial response" error message with no results.

Expected behavior:

An error message with the shard failures, and seeing the one document that matches the filter.

Any additional context:

Similar configurations also fail in the Security app

image

Looking into the exact request/response, it seems as though there have been changes in the response between 8.9 and 8.10. To illustrate this, in the above setup, you can run the following query:

POST *:tmp-*/_async_search?ccs_minimize_roundtrips=true
{
  "query": {
    "bool": {
      "must": [],
      "filter": [
        {
          "script": {
            "script": "doc['bar'].value < 10"
          }
        }
      ],
      "should": [],
      "must_not": []
    }
  }
}

In 8.10.0, ES will return "is_partial": true, whereas in 8.9.2, ES will return "is_partial": false.

In Kibana, we are using these flags to determine whether or not the response is an error:

return !response || !response.rawResponse || (!response.isRunning && !!response.isPartial);

This logic is incorrect and relies on a bug in ES that returns "is_partial": false whenever "is_running" is also false. For context, see #164893 and elastic/elasticsearch#98725.

The main problem now is that ES sometimes correctly reports "is_partial": true when some remote shards fail while others succeed, but still incorrectly reports "is_partial": false for non-CCS setups. (See elastic/elasticsearch#97731 and elastic/elasticsearch#98913.)

@thomasneirynck thomasneirynck added the bug Fixes for quality problems that affect the customer experience label Sep 14, 2023
@botelastic botelastic bot added the needs-team Issues missing a team label label Sep 14, 2023
@thomasneirynck thomasneirynck added regression impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. Team:DataDiscovery Discover, search (e.g. data plugin and KQL), data views, saved searches. For ES|QL, use Team:ES|QL. labels Sep 14, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-data-discovery (Team:DataDiscovery)

@botelastic botelastic bot removed the needs-team Issues missing a team label label Sep 14, 2023
@lukasolson lukasolson added impact:critical This issue should be addressed immediately due to a critical level of impact on the product. and removed impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. labels Sep 14, 2023
@thomasneirynck thomasneirynck changed the title Cross cluster search errors fail when isPartial = true Cross cluster search causes errors when is_partial: true in the response Sep 14, 2023
@davismcphee davismcphee added Feature:Search Querying infrastructure in Kibana v8.10.0 labels Sep 15, 2023
@drewdaemon
Copy link
Contributor

This is definitely a problem for visualizations as well. I followed the STR and got errors where I should have seen warnings:

Lens editor
Screenshot 2023-09-18 at 11 40 37 AM

Dashboard
Screenshot 2023-09-18 at 11 41 17 AM

Aggs-based editor
Screenshot 2023-09-18 at 11 42 35 AM

@mbudge
Copy link

mbudge commented Sep 18, 2023

This bug severely impacted both security and IT users. Going forward we're not going to upgrade to a new minor version of elastic until at least 2 maintenance releases are available. The reason for this is this bug won't be fixed until the second maintenance release, which makes me think second maintenance releases are more stable. Please review your testing processes as CCS should have been tested in Kibana before changes went GA.

@sophiec20
Copy link
Contributor

Yes, we are reviewing our testing for CCS and also looking to improve how we handle changes to elasticsearch API responses between versions.

davismcphee pushed a commit that referenced this issue Sep 18, 2023
## Summary

Fixes #166528.

### Checklist

Delete any items that are not applicable to this PR.

- [ ] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] Any UI touched in this PR is usable by keyboard only (learn more
about [keyboard accessibility](https://webaim.org/techniques/keyboard/))
- [ ] Any UI touched in this PR does not create any new axe failures
(run axe in browser:
[FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/),
[Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US))
- [ ] If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [ ] This renders correctly on smaller devices using a responsive
layout. (You can test this [in your
browser](https://www.browserstack.com/guide/responsive-testing-on-local-server))
- [ ] This was checked for [cross-browser
compatibility](https://www.elastic.co/support/matrix#matrix_browsers)


### Risk Matrix

Delete this section if it is not applicable to this PR.

Before closing this PR, invite QA, stakeholders, and other developers to
identify risks that should be tested prior to the change/feature
release.

When forming the risk matrix, consider some of the following examples
and how they may potentially impact the change:

| Risk | Probability | Severity | Mitigation/Notes |

|---------------------------|-------------|----------|-------------------------|
| Multiple Spaces&mdash;unexpected behavior in non-default Kibana Space.
| Low | High | Integration tests will verify that all features are still
supported in non-default Kibana Space and when user switches between
spaces. |
| Multiple nodes&mdash;Elasticsearch polling might have race conditions
when multiple Kibana nodes are polling for the same tasks. | High | Low
| Tasks are idempotent, so executing them multiple times will not result
in logical error, but will degrade performance. To test for this case we
add plenty of unit tests around this logic and document manual testing
procedure. |
| Code should gracefully handle cases when feature X or plugin Y are
disabled. | Medium | High | Unit tests will verify that any feature flag
or plugin combination still results in our service operational. |
| [See more potential risk
examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) |


### For maintainers

- [ ] This was checked for breaking API changes and was [labeled
appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

---------

Co-authored-by: Kibana Machine <[email protected]>
lukasolson added a commit to lukasolson/kibana that referenced this issue Sep 18, 2023
…ic#166544)

## Summary

Fixes elastic#166528.

### Checklist

Delete any items that are not applicable to this PR.

- [ ] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] Any UI touched in this PR is usable by keyboard only (learn more
about [keyboard accessibility](https://webaim.org/techniques/keyboard/))
- [ ] Any UI touched in this PR does not create any new axe failures
(run axe in browser:
[FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/),
[Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US))
- [ ] If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [ ] This renders correctly on smaller devices using a responsive
layout. (You can test this [in your
browser](https://www.browserstack.com/guide/responsive-testing-on-local-server))
- [ ] This was checked for [cross-browser
compatibility](https://www.elastic.co/support/matrix#matrix_browsers)


### Risk Matrix

Delete this section if it is not applicable to this PR.

Before closing this PR, invite QA, stakeholders, and other developers to
identify risks that should be tested prior to the change/feature
release.

When forming the risk matrix, consider some of the following examples
and how they may potentially impact the change:

| Risk | Probability | Severity | Mitigation/Notes |

|---------------------------|-------------|----------|-------------------------|
| Multiple Spaces&mdash;unexpected behavior in non-default Kibana Space.
| Low | High | Integration tests will verify that all features are still
supported in non-default Kibana Space and when user switches between
spaces. |
| Multiple nodes&mdash;Elasticsearch polling might have race conditions
when multiple Kibana nodes are polling for the same tasks. | High | Low
| Tasks are idempotent, so executing them multiple times will not result
in logical error, but will degrade performance. To test for this case we
add plenty of unit tests around this logic and document manual testing
procedure. |
| Code should gracefully handle cases when feature X or plugin Y are
disabled. | Medium | High | Unit tests will verify that any feature flag
or plugin combination still results in our service operational. |
| [See more potential risk
examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) |


### For maintainers

- [ ] This was checked for breaking API changes and was [labeled
appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

---------

Co-authored-by: Kibana Machine <[email protected]>
@kertal
Copy link
Member

kertal commented Sep 20, 2023

@lukasolson Guess we can close this now, right?

@kertal
Copy link
Member

kertal commented Sep 21, 2023

@lukasolson let's resolve it with #166667

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Feature:Search Querying infrastructure in Kibana impact:critical This issue should be addressed immediately due to a critical level of impact on the product. regression Team:DataDiscovery Discover, search (e.g. data plugin and KQL), data views, saved searches. For ES|QL, use Team:ES|QL. v8.10.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants