Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Data Views] has_es_data request hangs when remote clusters are unresponsive #200280

Closed
davismcphee opened this issue Nov 14, 2024 · 1 comment · Fixed by #200476
Closed

[Data Views] has_es_data request hangs when remote clusters are unresponsive #200280

davismcphee opened this issue Nov 14, 2024 · 1 comment · Fixed by #200476
Assignees
Labels
bug Fixes for quality problems that affect the customer experience Feature:Data Views Data Views code and UI - index patterns before 8.0 impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. Team:DataDiscovery Discover, search (e.g. data plugin and KQL), data views, saved searches. For ES|QL, use Team:ES|QL.

Comments

@davismcphee
Copy link
Contributor

davismcphee commented Nov 14, 2024

In #191566 we switched from using the resolve/index API to the resolve/cluster API for checking if any user data exists in Kibana. This was done for performance reasons, since in most cases resolve/cluster should respond significantly faster than resolve/index and return a smaller payload. However, this created an issue when any of the remote clusters are unresponsive, causing the resolve/cluster request to hang until it eventually times out, which can take upward of a minute. In these cases, the Kibana user is left waiting in a loading state in the UI (e.g. in Discover and Dashboard) until the request timeout.

We confirmed resolve/cluster was the cause by executing the underlying request sent by has_es_data directly in dev tools in an affected environment:

GET /_resolve/cluster/*%2C-.*%2C-logs-enterprise_search.api-default%2C-logs-enterprise_search.audit-default%2C*%3A*%2C*%3A-.*%2C*%3A-logs-enterprise_search.api-default%2C*%3A-logs-enterprise_search.audit-default?allow_no_indices=true&ignore_unavailable=true

We then executed a request against just the local indices, confirming it was fast:

GET /_resolve/cluster/%2A%2C-.%2A%2C-logs-enterprise_search.api-default%2C-logs-enterprise_search.audit-default?allow_no_indices=true&ignore_unavailable=true

And another against just the remote indices, confirming it was slow:

GET /_resolve/cluster/%2A%3A%2A%2C%2A%3A-.%2A%2C%2A%3A-logs-enterprise_search.api-default%2C%2A%3A-logs-enterprise_search.audit-default?allow_no_indices=true&ignore_unavailable=true

Notes:

@davismcphee davismcphee added bug Fixes for quality problems that affect the customer experience Feature:Data Views Data Views code and UI - index patterns before 8.0 impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. Team:DataDiscovery Discover, search (e.g. data plugin and KQL), data views, saved searches. For ES|QL, use Team:ES|QL. labels Nov 14, 2024
@davismcphee davismcphee self-assigned this Nov 14, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-data-discovery (Team:DataDiscovery)

kibanamachine pushed a commit to kibanamachine/kibana that referenced this issue Nov 20, 2024
…a to hang (elastic#200476)

## Summary

This PR mitigates an issue where the `has_es_data` check can hang when
some remote clusters are unresponsive, leaving users stuck in a loading
state in some apps (e.g. Discover and Dashboard) until the request times
out. There are two main changes that help mitigate this issue:
- The `resolve/cluster` request in the `has_es_data` endpoint has been
split into two requests -- one for local data first, then another for
remote data second. In cases where remote clusters are unresponsive but
there is data available in the local cluster, the remote check is never
performed and the check completes quickly. This likely resolves the
majority of cases and is also likely faster in general than checking
both local and remote clusters in a single request.
- In cases where there is no local data and the remote `resolve/cluster`
request hangs, a new `data_views.hasEsDataTimeout` config has been added
to `kibana.yml` (defaults to 5 seconds) to abort the request after a
short delay. This scenario is handled in the front end by displaying an
error toast to the user informing them of the issue, and assuming there
is data available to avoid blocking them. When this occurs, a warning is
also logged to the Kibana server logs.

![CleanShot 2024-11-18 at 23 47
34@2x](https://github.com/user-attachments/assets/6ea14869-b6b6-4d89-a90c-8150d6e6b043)

Fixes elastic#200280.

### Notes
- Modifying the existing version of the `has_es_data` endpoint in this
way should be backward compatible since the behaviour should remain
unchanged from before when the client and server versions don't match
(please validate if this seems accurate during review).
- For a long term fix, the ES team is investigating the issue with
`resolve/cluster` and will aim to have it behave like `resolve/index`,
which fails quickly when remote clusters are unresponsive. They may also
implement other mitigations like a configurable timeout in ES:
elastic/elasticsearch#114020. The purpose of
this PR is to provide an immediate solution in Kibana that mitigates the
issue as much as possible.
- If ES ends up providing another performant method for checking if
indices exist instead of `resolve/cluster`, Kibana should migrate to
that. More details in
elastic/elasticsearch#112307.

### Testing notes

To reproduce the issue locally, follow these steps:
- Follow [these
instructions](https://gist.github.com/lukasolson/d0861aa3e6ee476ac8dd7189ed476756)
to set up a local CCS environment.
- Stop the remote cluster process.
- Use Netcat on the remote cluster port to listen to requests but not
respond (e.g. on macOS: `nc -l 9600`), simulating an unresponsive
cluster. See elastic/elasticsearch#32678 for
more context.
- Navigate to Discover and observe that the `has_es_data` request hangs.
When testing in this PR branch, the request will only wait for 5 seconds
before assuming data exists and displaying a toast.

### Checklist

- [x] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [x] This was checked for breaking HTTP API changes, and any breaking
changes have been approved by the breaking-change committee. The
`release_note:breaking` label should be applied in these situations.
- [ ] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_node:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

---------

Co-authored-by: kibanamachine <[email protected]>
(cherry picked from commit 96fd4b6)
kibanamachine pushed a commit to kibanamachine/kibana that referenced this issue Nov 20, 2024
…a to hang (elastic#200476)

## Summary

This PR mitigates an issue where the `has_es_data` check can hang when
some remote clusters are unresponsive, leaving users stuck in a loading
state in some apps (e.g. Discover and Dashboard) until the request times
out. There are two main changes that help mitigate this issue:
- The `resolve/cluster` request in the `has_es_data` endpoint has been
split into two requests -- one for local data first, then another for
remote data second. In cases where remote clusters are unresponsive but
there is data available in the local cluster, the remote check is never
performed and the check completes quickly. This likely resolves the
majority of cases and is also likely faster in general than checking
both local and remote clusters in a single request.
- In cases where there is no local data and the remote `resolve/cluster`
request hangs, a new `data_views.hasEsDataTimeout` config has been added
to `kibana.yml` (defaults to 5 seconds) to abort the request after a
short delay. This scenario is handled in the front end by displaying an
error toast to the user informing them of the issue, and assuming there
is data available to avoid blocking them. When this occurs, a warning is
also logged to the Kibana server logs.

![CleanShot 2024-11-18 at 23 47
34@2x](https://github.com/user-attachments/assets/6ea14869-b6b6-4d89-a90c-8150d6e6b043)

Fixes elastic#200280.

### Notes
- Modifying the existing version of the `has_es_data` endpoint in this
way should be backward compatible since the behaviour should remain
unchanged from before when the client and server versions don't match
(please validate if this seems accurate during review).
- For a long term fix, the ES team is investigating the issue with
`resolve/cluster` and will aim to have it behave like `resolve/index`,
which fails quickly when remote clusters are unresponsive. They may also
implement other mitigations like a configurable timeout in ES:
elastic/elasticsearch#114020. The purpose of
this PR is to provide an immediate solution in Kibana that mitigates the
issue as much as possible.
- If ES ends up providing another performant method for checking if
indices exist instead of `resolve/cluster`, Kibana should migrate to
that. More details in
elastic/elasticsearch#112307.

### Testing notes

To reproduce the issue locally, follow these steps:
- Follow [these
instructions](https://gist.github.com/lukasolson/d0861aa3e6ee476ac8dd7189ed476756)
to set up a local CCS environment.
- Stop the remote cluster process.
- Use Netcat on the remote cluster port to listen to requests but not
respond (e.g. on macOS: `nc -l 9600`), simulating an unresponsive
cluster. See elastic/elasticsearch#32678 for
more context.
- Navigate to Discover and observe that the `has_es_data` request hangs.
When testing in this PR branch, the request will only wait for 5 seconds
before assuming data exists and displaying a toast.

### Checklist

- [x] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [x] This was checked for breaking HTTP API changes, and any breaking
changes have been approved by the breaking-change committee. The
`release_note:breaking` label should be applied in these situations.
- [ ] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_node:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

---------

Co-authored-by: kibanamachine <[email protected]>
(cherry picked from commit 96fd4b6)
kibanamachine pushed a commit to kibanamachine/kibana that referenced this issue Nov 20, 2024
…a to hang (elastic#200476)

## Summary

This PR mitigates an issue where the `has_es_data` check can hang when
some remote clusters are unresponsive, leaving users stuck in a loading
state in some apps (e.g. Discover and Dashboard) until the request times
out. There are two main changes that help mitigate this issue:
- The `resolve/cluster` request in the `has_es_data` endpoint has been
split into two requests -- one for local data first, then another for
remote data second. In cases where remote clusters are unresponsive but
there is data available in the local cluster, the remote check is never
performed and the check completes quickly. This likely resolves the
majority of cases and is also likely faster in general than checking
both local and remote clusters in a single request.
- In cases where there is no local data and the remote `resolve/cluster`
request hangs, a new `data_views.hasEsDataTimeout` config has been added
to `kibana.yml` (defaults to 5 seconds) to abort the request after a
short delay. This scenario is handled in the front end by displaying an
error toast to the user informing them of the issue, and assuming there
is data available to avoid blocking them. When this occurs, a warning is
also logged to the Kibana server logs.

![CleanShot 2024-11-18 at 23 47
34@2x](https://github.com/user-attachments/assets/6ea14869-b6b6-4d89-a90c-8150d6e6b043)

Fixes elastic#200280.

### Notes
- Modifying the existing version of the `has_es_data` endpoint in this
way should be backward compatible since the behaviour should remain
unchanged from before when the client and server versions don't match
(please validate if this seems accurate during review).
- For a long term fix, the ES team is investigating the issue with
`resolve/cluster` and will aim to have it behave like `resolve/index`,
which fails quickly when remote clusters are unresponsive. They may also
implement other mitigations like a configurable timeout in ES:
elastic/elasticsearch#114020. The purpose of
this PR is to provide an immediate solution in Kibana that mitigates the
issue as much as possible.
- If ES ends up providing another performant method for checking if
indices exist instead of `resolve/cluster`, Kibana should migrate to
that. More details in
elastic/elasticsearch#112307.

### Testing notes

To reproduce the issue locally, follow these steps:
- Follow [these
instructions](https://gist.github.com/lukasolson/d0861aa3e6ee476ac8dd7189ed476756)
to set up a local CCS environment.
- Stop the remote cluster process.
- Use Netcat on the remote cluster port to listen to requests but not
respond (e.g. on macOS: `nc -l 9600`), simulating an unresponsive
cluster. See elastic/elasticsearch#32678 for
more context.
- Navigate to Discover and observe that the `has_es_data` request hangs.
When testing in this PR branch, the request will only wait for 5 seconds
before assuming data exists and displaying a toast.

### Checklist

- [x] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [x] This was checked for breaking HTTP API changes, and any breaking
changes have been approved by the breaking-change committee. The
`release_note:breaking` label should be applied in these situations.
- [ ] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_node:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

---------

Co-authored-by: kibanamachine <[email protected]>
(cherry picked from commit 96fd4b6)
kibanamachine added a commit that referenced this issue Nov 20, 2024
… can cause Kibana to hang (#200476) (#201025)

# Backport

This will backport the following commits from `main` to `8.x`:
- [[Data Views] Mitigate issue where &#x60;has_es_data&#x60; check can
cause Kibana to hang
(#200476)](#200476)

<!--- Backport version: 9.4.3 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Davis
McPhee","email":"[email protected]"},"sourceCommit":{"committedDate":"2024-11-20T18:52:47Z","message":"[Data
Views] Mitigate issue where `has_es_data` check can cause Kibana to hang
(#200476)\n\n## Summary\r\n\r\nThis PR mitigates an issue where the
`has_es_data` check can hang when\r\nsome remote clusters are
unresponsive, leaving users stuck in a loading\r\nstate in some apps
(e.g. Discover and Dashboard) until the request times\r\nout. There are
two main changes that help mitigate this issue:\r\n- The
`resolve/cluster` request in the `has_es_data` endpoint has
been\r\nsplit into two requests -- one for local data first, then
another for\r\nremote data second. In cases where remote clusters are
unresponsive but\r\nthere is data available in the local cluster, the
remote check is never\r\nperformed and the check completes quickly. This
likely resolves the\r\nmajority of cases and is also likely faster in
general than checking\r\nboth local and remote clusters in a single
request.\r\n- In cases where there is no local data and the remote
`resolve/cluster`\r\nrequest hangs, a new `data_views.hasEsDataTimeout`
config has been added\r\nto `kibana.yml` (defaults to 5 seconds) to
abort the request after a\r\nshort delay. This scenario is handled in
the front end by displaying an\r\nerror toast to the user informing them
of the issue, and assuming there\r\nis data available to avoid blocking
them. When this occurs, a warning is\r\nalso logged to the Kibana server
logs.\r\n\r\n![CleanShot 2024-11-18 at 23
47\r\n34@2x](https://github.com/user-attachments/assets/6ea14869-b6b6-4d89-a90c-8150d6e6b043)\r\n\r\nFixes
#200280.\r\n\r\n### Notes\r\n- Modifying the existing version of the
`has_es_data` endpoint in this\r\nway should be backward compatible
since the behaviour should remain\r\nunchanged from before when the
client and server versions don't match\r\n(please validate if this seems
accurate during review).\r\n- For a long term fix, the ES team is
investigating the issue with\r\n`resolve/cluster` and will aim to have
it behave like `resolve/index`,\r\nwhich fails quickly when remote
clusters are unresponsive. They may also\r\nimplement other mitigations
like a configurable timeout in
ES:\r\nhttps://github.com/elastic/elasticsearch/issues/114020. The
purpose of\r\nthis PR is to provide an immediate solution in Kibana that
mitigates the\r\nissue as much as possible.\r\n- If ES ends up providing
another performant method for checking if\r\nindices exist instead of
`resolve/cluster`, Kibana should migrate to\r\nthat. More details
in\r\nhttps://github.com/elastic/elasticsearch/issues/112307.\r\n\r\n###
Testing notes\r\n\r\nTo reproduce the issue locally, follow these
steps:\r\n- Follow
[these\r\ninstructions](https://gist.github.com/lukasolson/d0861aa3e6ee476ac8dd7189ed476756)\r\nto
set up a local CCS environment.\r\n- Stop the remote cluster
process.\r\n- Use Netcat on the remote cluster port to listen to
requests but not\r\nrespond (e.g. on macOS: `nc -l 9600`), simulating an
unresponsive\r\ncluster. See
elastic/elasticsearch#32678 for\r\nmore
context.\r\n- Navigate to Discover and observe that the `has_es_data`
request hangs.\r\nWhen testing in this PR branch, the request will only
wait for 5 seconds\r\nbefore assuming data exists and displaying a
toast.\r\n\r\n### Checklist\r\n\r\n- [x] Any text added follows [EUI's
writing\r\nguidelines](https://elastic.github.io/eui/#/guidelines/writing),
uses\r\nsentence case text and includes
[i18n\r\nsupport](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)\r\n-
[
]\r\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\r\nwas
added for features that require explanation or tutorials\r\n- [x] [Unit
or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common scenarios\r\n- [ ] If a plugin
configuration key changed, check if it needs to be\r\nallowlisted in the
cloud and added to the
[docker\r\nlist](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)\r\n-
[x] This was checked for breaking HTTP API changes, and any
breaking\r\nchanges have been approved by the breaking-change committee.
The\r\n`release_note:breaking` label should be applied in these
situations.\r\n- [ ] [Flaky
Test\r\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1)
was\r\nused on any tests changed\r\n- [x] The PR description includes
the appropriate Release Notes section,\r\nand the correct
`release_node:*` label is applied per
the\r\n[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)\r\n\r\n---------\r\n\r\nCo-authored-by:
kibanamachine
<[email protected]>","sha":"96fd4b682b77f6c1d6d1c6ab0742462d9e9d2589","branchLabelMapping":{"^v9.0.0$":"main","^v8.17.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:fix","v9.0.0","Team:DataDiscovery","backport:prev-major"],"title":"[Data
Views] Mitigate issue where `has_es_data` check can cause Kibana to
hang","number":200476,"url":"https://github.com/elastic/kibana/pull/200476","mergeCommit":{"message":"[Data
Views] Mitigate issue where `has_es_data` check can cause Kibana to hang
(#200476)\n\n## Summary\r\n\r\nThis PR mitigates an issue where the
`has_es_data` check can hang when\r\nsome remote clusters are
unresponsive, leaving users stuck in a loading\r\nstate in some apps
(e.g. Discover and Dashboard) until the request times\r\nout. There are
two main changes that help mitigate this issue:\r\n- The
`resolve/cluster` request in the `has_es_data` endpoint has
been\r\nsplit into two requests -- one for local data first, then
another for\r\nremote data second. In cases where remote clusters are
unresponsive but\r\nthere is data available in the local cluster, the
remote check is never\r\nperformed and the check completes quickly. This
likely resolves the\r\nmajority of cases and is also likely faster in
general than checking\r\nboth local and remote clusters in a single
request.\r\n- In cases where there is no local data and the remote
`resolve/cluster`\r\nrequest hangs, a new `data_views.hasEsDataTimeout`
config has been added\r\nto `kibana.yml` (defaults to 5 seconds) to
abort the request after a\r\nshort delay. This scenario is handled in
the front end by displaying an\r\nerror toast to the user informing them
of the issue, and assuming there\r\nis data available to avoid blocking
them. When this occurs, a warning is\r\nalso logged to the Kibana server
logs.\r\n\r\n![CleanShot 2024-11-18 at 23
47\r\n34@2x](https://github.com/user-attachments/assets/6ea14869-b6b6-4d89-a90c-8150d6e6b043)\r\n\r\nFixes
#200280.\r\n\r\n### Notes\r\n- Modifying the existing version of the
`has_es_data` endpoint in this\r\nway should be backward compatible
since the behaviour should remain\r\nunchanged from before when the
client and server versions don't match\r\n(please validate if this seems
accurate during review).\r\n- For a long term fix, the ES team is
investigating the issue with\r\n`resolve/cluster` and will aim to have
it behave like `resolve/index`,\r\nwhich fails quickly when remote
clusters are unresponsive. They may also\r\nimplement other mitigations
like a configurable timeout in
ES:\r\nhttps://github.com/elastic/elasticsearch/issues/114020. The
purpose of\r\nthis PR is to provide an immediate solution in Kibana that
mitigates the\r\nissue as much as possible.\r\n- If ES ends up providing
another performant method for checking if\r\nindices exist instead of
`resolve/cluster`, Kibana should migrate to\r\nthat. More details
in\r\nhttps://github.com/elastic/elasticsearch/issues/112307.\r\n\r\n###
Testing notes\r\n\r\nTo reproduce the issue locally, follow these
steps:\r\n- Follow
[these\r\ninstructions](https://gist.github.com/lukasolson/d0861aa3e6ee476ac8dd7189ed476756)\r\nto
set up a local CCS environment.\r\n- Stop the remote cluster
process.\r\n- Use Netcat on the remote cluster port to listen to
requests but not\r\nrespond (e.g. on macOS: `nc -l 9600`), simulating an
unresponsive\r\ncluster. See
elastic/elasticsearch#32678 for\r\nmore
context.\r\n- Navigate to Discover and observe that the `has_es_data`
request hangs.\r\nWhen testing in this PR branch, the request will only
wait for 5 seconds\r\nbefore assuming data exists and displaying a
toast.\r\n\r\n### Checklist\r\n\r\n- [x] Any text added follows [EUI's
writing\r\nguidelines](https://elastic.github.io/eui/#/guidelines/writing),
uses\r\nsentence case text and includes
[i18n\r\nsupport](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)\r\n-
[
]\r\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\r\nwas
added for features that require explanation or tutorials\r\n- [x] [Unit
or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common scenarios\r\n- [ ] If a plugin
configuration key changed, check if it needs to be\r\nallowlisted in the
cloud and added to the
[docker\r\nlist](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)\r\n-
[x] This was checked for breaking HTTP API changes, and any
breaking\r\nchanges have been approved by the breaking-change committee.
The\r\n`release_note:breaking` label should be applied in these
situations.\r\n- [ ] [Flaky
Test\r\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1)
was\r\nused on any tests changed\r\n- [x] The PR description includes
the appropriate Release Notes section,\r\nand the correct
`release_node:*` label is applied per
the\r\n[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)\r\n\r\n---------\r\n\r\nCo-authored-by:
kibanamachine
<[email protected]>","sha":"96fd4b682b77f6c1d6d1c6ab0742462d9e9d2589"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","branchLabelMappingKey":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/200476","number":200476,"mergeCommit":{"message":"[Data
Views] Mitigate issue where `has_es_data` check can cause Kibana to hang
(#200476)\n\n## Summary\r\n\r\nThis PR mitigates an issue where the
`has_es_data` check can hang when\r\nsome remote clusters are
unresponsive, leaving users stuck in a loading\r\nstate in some apps
(e.g. Discover and Dashboard) until the request times\r\nout. There are
two main changes that help mitigate this issue:\r\n- The
`resolve/cluster` request in the `has_es_data` endpoint has
been\r\nsplit into two requests -- one for local data first, then
another for\r\nremote data second. In cases where remote clusters are
unresponsive but\r\nthere is data available in the local cluster, the
remote check is never\r\nperformed and the check completes quickly. This
likely resolves the\r\nmajority of cases and is also likely faster in
general than checking\r\nboth local and remote clusters in a single
request.\r\n- In cases where there is no local data and the remote
`resolve/cluster`\r\nrequest hangs, a new `data_views.hasEsDataTimeout`
config has been added\r\nto `kibana.yml` (defaults to 5 seconds) to
abort the request after a\r\nshort delay. This scenario is handled in
the front end by displaying an\r\nerror toast to the user informing them
of the issue, and assuming there\r\nis data available to avoid blocking
them. When this occurs, a warning is\r\nalso logged to the Kibana server
logs.\r\n\r\n![CleanShot 2024-11-18 at 23
47\r\n34@2x](https://github.com/user-attachments/assets/6ea14869-b6b6-4d89-a90c-8150d6e6b043)\r\n\r\nFixes
#200280.\r\n\r\n### Notes\r\n- Modifying the existing version of the
`has_es_data` endpoint in this\r\nway should be backward compatible
since the behaviour should remain\r\nunchanged from before when the
client and server versions don't match\r\n(please validate if this seems
accurate during review).\r\n- For a long term fix, the ES team is
investigating the issue with\r\n`resolve/cluster` and will aim to have
it behave like `resolve/index`,\r\nwhich fails quickly when remote
clusters are unresponsive. They may also\r\nimplement other mitigations
like a configurable timeout in
ES:\r\nhttps://github.com/elastic/elasticsearch/issues/114020. The
purpose of\r\nthis PR is to provide an immediate solution in Kibana that
mitigates the\r\nissue as much as possible.\r\n- If ES ends up providing
another performant method for checking if\r\nindices exist instead of
`resolve/cluster`, Kibana should migrate to\r\nthat. More details
in\r\nhttps://github.com/elastic/elasticsearch/issues/112307.\r\n\r\n###
Testing notes\r\n\r\nTo reproduce the issue locally, follow these
steps:\r\n- Follow
[these\r\ninstructions](https://gist.github.com/lukasolson/d0861aa3e6ee476ac8dd7189ed476756)\r\nto
set up a local CCS environment.\r\n- Stop the remote cluster
process.\r\n- Use Netcat on the remote cluster port to listen to
requests but not\r\nrespond (e.g. on macOS: `nc -l 9600`), simulating an
unresponsive\r\ncluster. See
elastic/elasticsearch#32678 for\r\nmore
context.\r\n- Navigate to Discover and observe that the `has_es_data`
request hangs.\r\nWhen testing in this PR branch, the request will only
wait for 5 seconds\r\nbefore assuming data exists and displaying a
toast.\r\n\r\n### Checklist\r\n\r\n- [x] Any text added follows [EUI's
writing\r\nguidelines](https://elastic.github.io/eui/#/guidelines/writing),
uses\r\nsentence case text and includes
[i18n\r\nsupport](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)\r\n-
[
]\r\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\r\nwas
added for features that require explanation or tutorials\r\n- [x] [Unit
or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common scenarios\r\n- [ ] If a plugin
configuration key changed, check if it needs to be\r\nallowlisted in the
cloud and added to the
[docker\r\nlist](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)\r\n-
[x] This was checked for breaking HTTP API changes, and any
breaking\r\nchanges have been approved by the breaking-change committee.
The\r\n`release_note:breaking` label should be applied in these
situations.\r\n- [ ] [Flaky
Test\r\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1)
was\r\nused on any tests changed\r\n- [x] The PR description includes
the appropriate Release Notes section,\r\nand the correct
`release_node:*` label is applied per
the\r\n[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)\r\n\r\n---------\r\n\r\nCo-authored-by:
kibanamachine
<[email protected]>","sha":"96fd4b682b77f6c1d6d1c6ab0742462d9e9d2589"}}]}]
BACKPORT-->

Co-authored-by: Davis McPhee <[email protected]>
kibanamachine added a commit that referenced this issue Nov 20, 2024
…k can cause Kibana to hang (#200476) (#201024)

# Backport

This will backport the following commits from `main` to `8.16`:
- [[Data Views] Mitigate issue where &#x60;has_es_data&#x60; check can
cause Kibana to hang
(#200476)](#200476)

<!--- Backport version: 9.4.3 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Davis
McPhee","email":"[email protected]"},"sourceCommit":{"committedDate":"2024-11-20T18:52:47Z","message":"[Data
Views] Mitigate issue where `has_es_data` check can cause Kibana to hang
(#200476)\n\n## Summary\r\n\r\nThis PR mitigates an issue where the
`has_es_data` check can hang when\r\nsome remote clusters are
unresponsive, leaving users stuck in a loading\r\nstate in some apps
(e.g. Discover and Dashboard) until the request times\r\nout. There are
two main changes that help mitigate this issue:\r\n- The
`resolve/cluster` request in the `has_es_data` endpoint has
been\r\nsplit into two requests -- one for local data first, then
another for\r\nremote data second. In cases where remote clusters are
unresponsive but\r\nthere is data available in the local cluster, the
remote check is never\r\nperformed and the check completes quickly. This
likely resolves the\r\nmajority of cases and is also likely faster in
general than checking\r\nboth local and remote clusters in a single
request.\r\n- In cases where there is no local data and the remote
`resolve/cluster`\r\nrequest hangs, a new `data_views.hasEsDataTimeout`
config has been added\r\nto `kibana.yml` (defaults to 5 seconds) to
abort the request after a\r\nshort delay. This scenario is handled in
the front end by displaying an\r\nerror toast to the user informing them
of the issue, and assuming there\r\nis data available to avoid blocking
them. When this occurs, a warning is\r\nalso logged to the Kibana server
logs.\r\n\r\n![CleanShot 2024-11-18 at 23
47\r\n34@2x](https://github.com/user-attachments/assets/6ea14869-b6b6-4d89-a90c-8150d6e6b043)\r\n\r\nFixes
#200280.\r\n\r\n### Notes\r\n- Modifying the existing version of the
`has_es_data` endpoint in this\r\nway should be backward compatible
since the behaviour should remain\r\nunchanged from before when the
client and server versions don't match\r\n(please validate if this seems
accurate during review).\r\n- For a long term fix, the ES team is
investigating the issue with\r\n`resolve/cluster` and will aim to have
it behave like `resolve/index`,\r\nwhich fails quickly when remote
clusters are unresponsive. They may also\r\nimplement other mitigations
like a configurable timeout in
ES:\r\nhttps://github.com/elastic/elasticsearch/issues/114020. The
purpose of\r\nthis PR is to provide an immediate solution in Kibana that
mitigates the\r\nissue as much as possible.\r\n- If ES ends up providing
another performant method for checking if\r\nindices exist instead of
`resolve/cluster`, Kibana should migrate to\r\nthat. More details
in\r\nhttps://github.com/elastic/elasticsearch/issues/112307.\r\n\r\n###
Testing notes\r\n\r\nTo reproduce the issue locally, follow these
steps:\r\n- Follow
[these\r\ninstructions](https://gist.github.com/lukasolson/d0861aa3e6ee476ac8dd7189ed476756)\r\nto
set up a local CCS environment.\r\n- Stop the remote cluster
process.\r\n- Use Netcat on the remote cluster port to listen to
requests but not\r\nrespond (e.g. on macOS: `nc -l 9600`), simulating an
unresponsive\r\ncluster. See
elastic/elasticsearch#32678 for\r\nmore
context.\r\n- Navigate to Discover and observe that the `has_es_data`
request hangs.\r\nWhen testing in this PR branch, the request will only
wait for 5 seconds\r\nbefore assuming data exists and displaying a
toast.\r\n\r\n### Checklist\r\n\r\n- [x] Any text added follows [EUI's
writing\r\nguidelines](https://elastic.github.io/eui/#/guidelines/writing),
uses\r\nsentence case text and includes
[i18n\r\nsupport](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)\r\n-
[
]\r\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\r\nwas
added for features that require explanation or tutorials\r\n- [x] [Unit
or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common scenarios\r\n- [ ] If a plugin
configuration key changed, check if it needs to be\r\nallowlisted in the
cloud and added to the
[docker\r\nlist](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)\r\n-
[x] This was checked for breaking HTTP API changes, and any
breaking\r\nchanges have been approved by the breaking-change committee.
The\r\n`release_note:breaking` label should be applied in these
situations.\r\n- [ ] [Flaky
Test\r\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1)
was\r\nused on any tests changed\r\n- [x] The PR description includes
the appropriate Release Notes section,\r\nand the correct
`release_node:*` label is applied per
the\r\n[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)\r\n\r\n---------\r\n\r\nCo-authored-by:
kibanamachine
<[email protected]>","sha":"96fd4b682b77f6c1d6d1c6ab0742462d9e9d2589","branchLabelMapping":{"^v9.0.0$":"main","^v8.17.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:fix","v9.0.0","Team:DataDiscovery","backport:prev-major"],"title":"[Data
Views] Mitigate issue where `has_es_data` check can cause Kibana to
hang","number":200476,"url":"https://github.com/elastic/kibana/pull/200476","mergeCommit":{"message":"[Data
Views] Mitigate issue where `has_es_data` check can cause Kibana to hang
(#200476)\n\n## Summary\r\n\r\nThis PR mitigates an issue where the
`has_es_data` check can hang when\r\nsome remote clusters are
unresponsive, leaving users stuck in a loading\r\nstate in some apps
(e.g. Discover and Dashboard) until the request times\r\nout. There are
two main changes that help mitigate this issue:\r\n- The
`resolve/cluster` request in the `has_es_data` endpoint has
been\r\nsplit into two requests -- one for local data first, then
another for\r\nremote data second. In cases where remote clusters are
unresponsive but\r\nthere is data available in the local cluster, the
remote check is never\r\nperformed and the check completes quickly. This
likely resolves the\r\nmajority of cases and is also likely faster in
general than checking\r\nboth local and remote clusters in a single
request.\r\n- In cases where there is no local data and the remote
`resolve/cluster`\r\nrequest hangs, a new `data_views.hasEsDataTimeout`
config has been added\r\nto `kibana.yml` (defaults to 5 seconds) to
abort the request after a\r\nshort delay. This scenario is handled in
the front end by displaying an\r\nerror toast to the user informing them
of the issue, and assuming there\r\nis data available to avoid blocking
them. When this occurs, a warning is\r\nalso logged to the Kibana server
logs.\r\n\r\n![CleanShot 2024-11-18 at 23
47\r\n34@2x](https://github.com/user-attachments/assets/6ea14869-b6b6-4d89-a90c-8150d6e6b043)\r\n\r\nFixes
#200280.\r\n\r\n### Notes\r\n- Modifying the existing version of the
`has_es_data` endpoint in this\r\nway should be backward compatible
since the behaviour should remain\r\nunchanged from before when the
client and server versions don't match\r\n(please validate if this seems
accurate during review).\r\n- For a long term fix, the ES team is
investigating the issue with\r\n`resolve/cluster` and will aim to have
it behave like `resolve/index`,\r\nwhich fails quickly when remote
clusters are unresponsive. They may also\r\nimplement other mitigations
like a configurable timeout in
ES:\r\nhttps://github.com/elastic/elasticsearch/issues/114020. The
purpose of\r\nthis PR is to provide an immediate solution in Kibana that
mitigates the\r\nissue as much as possible.\r\n- If ES ends up providing
another performant method for checking if\r\nindices exist instead of
`resolve/cluster`, Kibana should migrate to\r\nthat. More details
in\r\nhttps://github.com/elastic/elasticsearch/issues/112307.\r\n\r\n###
Testing notes\r\n\r\nTo reproduce the issue locally, follow these
steps:\r\n- Follow
[these\r\ninstructions](https://gist.github.com/lukasolson/d0861aa3e6ee476ac8dd7189ed476756)\r\nto
set up a local CCS environment.\r\n- Stop the remote cluster
process.\r\n- Use Netcat on the remote cluster port to listen to
requests but not\r\nrespond (e.g. on macOS: `nc -l 9600`), simulating an
unresponsive\r\ncluster. See
elastic/elasticsearch#32678 for\r\nmore
context.\r\n- Navigate to Discover and observe that the `has_es_data`
request hangs.\r\nWhen testing in this PR branch, the request will only
wait for 5 seconds\r\nbefore assuming data exists and displaying a
toast.\r\n\r\n### Checklist\r\n\r\n- [x] Any text added follows [EUI's
writing\r\nguidelines](https://elastic.github.io/eui/#/guidelines/writing),
uses\r\nsentence case text and includes
[i18n\r\nsupport](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)\r\n-
[
]\r\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\r\nwas
added for features that require explanation or tutorials\r\n- [x] [Unit
or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common scenarios\r\n- [ ] If a plugin
configuration key changed, check if it needs to be\r\nallowlisted in the
cloud and added to the
[docker\r\nlist](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)\r\n-
[x] This was checked for breaking HTTP API changes, and any
breaking\r\nchanges have been approved by the breaking-change committee.
The\r\n`release_note:breaking` label should be applied in these
situations.\r\n- [ ] [Flaky
Test\r\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1)
was\r\nused on any tests changed\r\n- [x] The PR description includes
the appropriate Release Notes section,\r\nand the correct
`release_node:*` label is applied per
the\r\n[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)\r\n\r\n---------\r\n\r\nCo-authored-by:
kibanamachine
<[email protected]>","sha":"96fd4b682b77f6c1d6d1c6ab0742462d9e9d2589"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","branchLabelMappingKey":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/200476","number":200476,"mergeCommit":{"message":"[Data
Views] Mitigate issue where `has_es_data` check can cause Kibana to hang
(#200476)\n\n## Summary\r\n\r\nThis PR mitigates an issue where the
`has_es_data` check can hang when\r\nsome remote clusters are
unresponsive, leaving users stuck in a loading\r\nstate in some apps
(e.g. Discover and Dashboard) until the request times\r\nout. There are
two main changes that help mitigate this issue:\r\n- The
`resolve/cluster` request in the `has_es_data` endpoint has
been\r\nsplit into two requests -- one for local data first, then
another for\r\nremote data second. In cases where remote clusters are
unresponsive but\r\nthere is data available in the local cluster, the
remote check is never\r\nperformed and the check completes quickly. This
likely resolves the\r\nmajority of cases and is also likely faster in
general than checking\r\nboth local and remote clusters in a single
request.\r\n- In cases where there is no local data and the remote
`resolve/cluster`\r\nrequest hangs, a new `data_views.hasEsDataTimeout`
config has been added\r\nto `kibana.yml` (defaults to 5 seconds) to
abort the request after a\r\nshort delay. This scenario is handled in
the front end by displaying an\r\nerror toast to the user informing them
of the issue, and assuming there\r\nis data available to avoid blocking
them. When this occurs, a warning is\r\nalso logged to the Kibana server
logs.\r\n\r\n![CleanShot 2024-11-18 at 23
47\r\n34@2x](https://github.com/user-attachments/assets/6ea14869-b6b6-4d89-a90c-8150d6e6b043)\r\n\r\nFixes
#200280.\r\n\r\n### Notes\r\n- Modifying the existing version of the
`has_es_data` endpoint in this\r\nway should be backward compatible
since the behaviour should remain\r\nunchanged from before when the
client and server versions don't match\r\n(please validate if this seems
accurate during review).\r\n- For a long term fix, the ES team is
investigating the issue with\r\n`resolve/cluster` and will aim to have
it behave like `resolve/index`,\r\nwhich fails quickly when remote
clusters are unresponsive. They may also\r\nimplement other mitigations
like a configurable timeout in
ES:\r\nhttps://github.com/elastic/elasticsearch/issues/114020. The
purpose of\r\nthis PR is to provide an immediate solution in Kibana that
mitigates the\r\nissue as much as possible.\r\n- If ES ends up providing
another performant method for checking if\r\nindices exist instead of
`resolve/cluster`, Kibana should migrate to\r\nthat. More details
in\r\nhttps://github.com/elastic/elasticsearch/issues/112307.\r\n\r\n###
Testing notes\r\n\r\nTo reproduce the issue locally, follow these
steps:\r\n- Follow
[these\r\ninstructions](https://gist.github.com/lukasolson/d0861aa3e6ee476ac8dd7189ed476756)\r\nto
set up a local CCS environment.\r\n- Stop the remote cluster
process.\r\n- Use Netcat on the remote cluster port to listen to
requests but not\r\nrespond (e.g. on macOS: `nc -l 9600`), simulating an
unresponsive\r\ncluster. See
elastic/elasticsearch#32678 for\r\nmore
context.\r\n- Navigate to Discover and observe that the `has_es_data`
request hangs.\r\nWhen testing in this PR branch, the request will only
wait for 5 seconds\r\nbefore assuming data exists and displaying a
toast.\r\n\r\n### Checklist\r\n\r\n- [x] Any text added follows [EUI's
writing\r\nguidelines](https://elastic.github.io/eui/#/guidelines/writing),
uses\r\nsentence case text and includes
[i18n\r\nsupport](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)\r\n-
[
]\r\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\r\nwas
added for features that require explanation or tutorials\r\n- [x] [Unit
or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common scenarios\r\n- [ ] If a plugin
configuration key changed, check if it needs to be\r\nallowlisted in the
cloud and added to the
[docker\r\nlist](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)\r\n-
[x] This was checked for breaking HTTP API changes, and any
breaking\r\nchanges have been approved by the breaking-change committee.
The\r\n`release_note:breaking` label should be applied in these
situations.\r\n- [ ] [Flaky
Test\r\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1)
was\r\nused on any tests changed\r\n- [x] The PR description includes
the appropriate Release Notes section,\r\nand the correct
`release_node:*` label is applied per
the\r\n[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)\r\n\r\n---------\r\n\r\nCo-authored-by:
kibanamachine
<[email protected]>","sha":"96fd4b682b77f6c1d6d1c6ab0742462d9e9d2589"}}]}]
BACKPORT-->

Co-authored-by: Davis McPhee <[email protected]>
kibanamachine added a commit that referenced this issue Nov 20, 2024
…k can cause Kibana to hang (#200476) (#201023)

# Backport

This will backport the following commits from `main` to `8.15`:
- [[Data Views] Mitigate issue where &#x60;has_es_data&#x60; check can
cause Kibana to hang
(#200476)](#200476)

<!--- Backport version: 9.4.3 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Davis
McPhee","email":"[email protected]"},"sourceCommit":{"committedDate":"2024-11-20T18:52:47Z","message":"[Data
Views] Mitigate issue where `has_es_data` check can cause Kibana to hang
(#200476)\n\n## Summary\r\n\r\nThis PR mitigates an issue where the
`has_es_data` check can hang when\r\nsome remote clusters are
unresponsive, leaving users stuck in a loading\r\nstate in some apps
(e.g. Discover and Dashboard) until the request times\r\nout. There are
two main changes that help mitigate this issue:\r\n- The
`resolve/cluster` request in the `has_es_data` endpoint has
been\r\nsplit into two requests -- one for local data first, then
another for\r\nremote data second. In cases where remote clusters are
unresponsive but\r\nthere is data available in the local cluster, the
remote check is never\r\nperformed and the check completes quickly. This
likely resolves the\r\nmajority of cases and is also likely faster in
general than checking\r\nboth local and remote clusters in a single
request.\r\n- In cases where there is no local data and the remote
`resolve/cluster`\r\nrequest hangs, a new `data_views.hasEsDataTimeout`
config has been added\r\nto `kibana.yml` (defaults to 5 seconds) to
abort the request after a\r\nshort delay. This scenario is handled in
the front end by displaying an\r\nerror toast to the user informing them
of the issue, and assuming there\r\nis data available to avoid blocking
them. When this occurs, a warning is\r\nalso logged to the Kibana server
logs.\r\n\r\n![CleanShot 2024-11-18 at 23
47\r\n34@2x](https://github.com/user-attachments/assets/6ea14869-b6b6-4d89-a90c-8150d6e6b043)\r\n\r\nFixes
#200280.\r\n\r\n### Notes\r\n- Modifying the existing version of the
`has_es_data` endpoint in this\r\nway should be backward compatible
since the behaviour should remain\r\nunchanged from before when the
client and server versions don't match\r\n(please validate if this seems
accurate during review).\r\n- For a long term fix, the ES team is
investigating the issue with\r\n`resolve/cluster` and will aim to have
it behave like `resolve/index`,\r\nwhich fails quickly when remote
clusters are unresponsive. They may also\r\nimplement other mitigations
like a configurable timeout in
ES:\r\nhttps://github.com/elastic/elasticsearch/issues/114020. The
purpose of\r\nthis PR is to provide an immediate solution in Kibana that
mitigates the\r\nissue as much as possible.\r\n- If ES ends up providing
another performant method for checking if\r\nindices exist instead of
`resolve/cluster`, Kibana should migrate to\r\nthat. More details
in\r\nhttps://github.com/elastic/elasticsearch/issues/112307.\r\n\r\n###
Testing notes\r\n\r\nTo reproduce the issue locally, follow these
steps:\r\n- Follow
[these\r\ninstructions](https://gist.github.com/lukasolson/d0861aa3e6ee476ac8dd7189ed476756)\r\nto
set up a local CCS environment.\r\n- Stop the remote cluster
process.\r\n- Use Netcat on the remote cluster port to listen to
requests but not\r\nrespond (e.g. on macOS: `nc -l 9600`), simulating an
unresponsive\r\ncluster. See
elastic/elasticsearch#32678 for\r\nmore
context.\r\n- Navigate to Discover and observe that the `has_es_data`
request hangs.\r\nWhen testing in this PR branch, the request will only
wait for 5 seconds\r\nbefore assuming data exists and displaying a
toast.\r\n\r\n### Checklist\r\n\r\n- [x] Any text added follows [EUI's
writing\r\nguidelines](https://elastic.github.io/eui/#/guidelines/writing),
uses\r\nsentence case text and includes
[i18n\r\nsupport](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)\r\n-
[
]\r\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\r\nwas
added for features that require explanation or tutorials\r\n- [x] [Unit
or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common scenarios\r\n- [ ] If a plugin
configuration key changed, check if it needs to be\r\nallowlisted in the
cloud and added to the
[docker\r\nlist](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)\r\n-
[x] This was checked for breaking HTTP API changes, and any
breaking\r\nchanges have been approved by the breaking-change committee.
The\r\n`release_note:breaking` label should be applied in these
situations.\r\n- [ ] [Flaky
Test\r\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1)
was\r\nused on any tests changed\r\n- [x] The PR description includes
the appropriate Release Notes section,\r\nand the correct
`release_node:*` label is applied per
the\r\n[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)\r\n\r\n---------\r\n\r\nCo-authored-by:
kibanamachine
<[email protected]>","sha":"96fd4b682b77f6c1d6d1c6ab0742462d9e9d2589","branchLabelMapping":{"^v9.0.0$":"main","^v8.17.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:fix","v9.0.0","Team:DataDiscovery","backport:prev-major"],"title":"[Data
Views] Mitigate issue where `has_es_data` check can cause Kibana to
hang","number":200476,"url":"https://github.com/elastic/kibana/pull/200476","mergeCommit":{"message":"[Data
Views] Mitigate issue where `has_es_data` check can cause Kibana to hang
(#200476)\n\n## Summary\r\n\r\nThis PR mitigates an issue where the
`has_es_data` check can hang when\r\nsome remote clusters are
unresponsive, leaving users stuck in a loading\r\nstate in some apps
(e.g. Discover and Dashboard) until the request times\r\nout. There are
two main changes that help mitigate this issue:\r\n- The
`resolve/cluster` request in the `has_es_data` endpoint has
been\r\nsplit into two requests -- one for local data first, then
another for\r\nremote data second. In cases where remote clusters are
unresponsive but\r\nthere is data available in the local cluster, the
remote check is never\r\nperformed and the check completes quickly. This
likely resolves the\r\nmajority of cases and is also likely faster in
general than checking\r\nboth local and remote clusters in a single
request.\r\n- In cases where there is no local data and the remote
`resolve/cluster`\r\nrequest hangs, a new `data_views.hasEsDataTimeout`
config has been added\r\nto `kibana.yml` (defaults to 5 seconds) to
abort the request after a\r\nshort delay. This scenario is handled in
the front end by displaying an\r\nerror toast to the user informing them
of the issue, and assuming there\r\nis data available to avoid blocking
them. When this occurs, a warning is\r\nalso logged to the Kibana server
logs.\r\n\r\n![CleanShot 2024-11-18 at 23
47\r\n34@2x](https://github.com/user-attachments/assets/6ea14869-b6b6-4d89-a90c-8150d6e6b043)\r\n\r\nFixes
#200280.\r\n\r\n### Notes\r\n- Modifying the existing version of the
`has_es_data` endpoint in this\r\nway should be backward compatible
since the behaviour should remain\r\nunchanged from before when the
client and server versions don't match\r\n(please validate if this seems
accurate during review).\r\n- For a long term fix, the ES team is
investigating the issue with\r\n`resolve/cluster` and will aim to have
it behave like `resolve/index`,\r\nwhich fails quickly when remote
clusters are unresponsive. They may also\r\nimplement other mitigations
like a configurable timeout in
ES:\r\nhttps://github.com/elastic/elasticsearch/issues/114020. The
purpose of\r\nthis PR is to provide an immediate solution in Kibana that
mitigates the\r\nissue as much as possible.\r\n- If ES ends up providing
another performant method for checking if\r\nindices exist instead of
`resolve/cluster`, Kibana should migrate to\r\nthat. More details
in\r\nhttps://github.com/elastic/elasticsearch/issues/112307.\r\n\r\n###
Testing notes\r\n\r\nTo reproduce the issue locally, follow these
steps:\r\n- Follow
[these\r\ninstructions](https://gist.github.com/lukasolson/d0861aa3e6ee476ac8dd7189ed476756)\r\nto
set up a local CCS environment.\r\n- Stop the remote cluster
process.\r\n- Use Netcat on the remote cluster port to listen to
requests but not\r\nrespond (e.g. on macOS: `nc -l 9600`), simulating an
unresponsive\r\ncluster. See
elastic/elasticsearch#32678 for\r\nmore
context.\r\n- Navigate to Discover and observe that the `has_es_data`
request hangs.\r\nWhen testing in this PR branch, the request will only
wait for 5 seconds\r\nbefore assuming data exists and displaying a
toast.\r\n\r\n### Checklist\r\n\r\n- [x] Any text added follows [EUI's
writing\r\nguidelines](https://elastic.github.io/eui/#/guidelines/writing),
uses\r\nsentence case text and includes
[i18n\r\nsupport](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)\r\n-
[
]\r\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\r\nwas
added for features that require explanation or tutorials\r\n- [x] [Unit
or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common scenarios\r\n- [ ] If a plugin
configuration key changed, check if it needs to be\r\nallowlisted in the
cloud and added to the
[docker\r\nlist](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)\r\n-
[x] This was checked for breaking HTTP API changes, and any
breaking\r\nchanges have been approved by the breaking-change committee.
The\r\n`release_note:breaking` label should be applied in these
situations.\r\n- [ ] [Flaky
Test\r\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1)
was\r\nused on any tests changed\r\n- [x] The PR description includes
the appropriate Release Notes section,\r\nand the correct
`release_node:*` label is applied per
the\r\n[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)\r\n\r\n---------\r\n\r\nCo-authored-by:
kibanamachine
<[email protected]>","sha":"96fd4b682b77f6c1d6d1c6ab0742462d9e9d2589"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","branchLabelMappingKey":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/200476","number":200476,"mergeCommit":{"message":"[Data
Views] Mitigate issue where `has_es_data` check can cause Kibana to hang
(#200476)\n\n## Summary\r\n\r\nThis PR mitigates an issue where the
`has_es_data` check can hang when\r\nsome remote clusters are
unresponsive, leaving users stuck in a loading\r\nstate in some apps
(e.g. Discover and Dashboard) until the request times\r\nout. There are
two main changes that help mitigate this issue:\r\n- The
`resolve/cluster` request in the `has_es_data` endpoint has
been\r\nsplit into two requests -- one for local data first, then
another for\r\nremote data second. In cases where remote clusters are
unresponsive but\r\nthere is data available in the local cluster, the
remote check is never\r\nperformed and the check completes quickly. This
likely resolves the\r\nmajority of cases and is also likely faster in
general than checking\r\nboth local and remote clusters in a single
request.\r\n- In cases where there is no local data and the remote
`resolve/cluster`\r\nrequest hangs, a new `data_views.hasEsDataTimeout`
config has been added\r\nto `kibana.yml` (defaults to 5 seconds) to
abort the request after a\r\nshort delay. This scenario is handled in
the front end by displaying an\r\nerror toast to the user informing them
of the issue, and assuming there\r\nis data available to avoid blocking
them. When this occurs, a warning is\r\nalso logged to the Kibana server
logs.\r\n\r\n![CleanShot 2024-11-18 at 23
47\r\n34@2x](https://github.com/user-attachments/assets/6ea14869-b6b6-4d89-a90c-8150d6e6b043)\r\n\r\nFixes
#200280.\r\n\r\n### Notes\r\n- Modifying the existing version of the
`has_es_data` endpoint in this\r\nway should be backward compatible
since the behaviour should remain\r\nunchanged from before when the
client and server versions don't match\r\n(please validate if this seems
accurate during review).\r\n- For a long term fix, the ES team is
investigating the issue with\r\n`resolve/cluster` and will aim to have
it behave like `resolve/index`,\r\nwhich fails quickly when remote
clusters are unresponsive. They may also\r\nimplement other mitigations
like a configurable timeout in
ES:\r\nhttps://github.com/elastic/elasticsearch/issues/114020. The
purpose of\r\nthis PR is to provide an immediate solution in Kibana that
mitigates the\r\nissue as much as possible.\r\n- If ES ends up providing
another performant method for checking if\r\nindices exist instead of
`resolve/cluster`, Kibana should migrate to\r\nthat. More details
in\r\nhttps://github.com/elastic/elasticsearch/issues/112307.\r\n\r\n###
Testing notes\r\n\r\nTo reproduce the issue locally, follow these
steps:\r\n- Follow
[these\r\ninstructions](https://gist.github.com/lukasolson/d0861aa3e6ee476ac8dd7189ed476756)\r\nto
set up a local CCS environment.\r\n- Stop the remote cluster
process.\r\n- Use Netcat on the remote cluster port to listen to
requests but not\r\nrespond (e.g. on macOS: `nc -l 9600`), simulating an
unresponsive\r\ncluster. See
elastic/elasticsearch#32678 for\r\nmore
context.\r\n- Navigate to Discover and observe that the `has_es_data`
request hangs.\r\nWhen testing in this PR branch, the request will only
wait for 5 seconds\r\nbefore assuming data exists and displaying a
toast.\r\n\r\n### Checklist\r\n\r\n- [x] Any text added follows [EUI's
writing\r\nguidelines](https://elastic.github.io/eui/#/guidelines/writing),
uses\r\nsentence case text and includes
[i18n\r\nsupport](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)\r\n-
[
]\r\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\r\nwas
added for features that require explanation or tutorials\r\n- [x] [Unit
or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common scenarios\r\n- [ ] If a plugin
configuration key changed, check if it needs to be\r\nallowlisted in the
cloud and added to the
[docker\r\nlist](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)\r\n-
[x] This was checked for breaking HTTP API changes, and any
breaking\r\nchanges have been approved by the breaking-change committee.
The\r\n`release_note:breaking` label should be applied in these
situations.\r\n- [ ] [Flaky
Test\r\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1)
was\r\nused on any tests changed\r\n- [x] The PR description includes
the appropriate Release Notes section,\r\nand the correct
`release_node:*` label is applied per
the\r\n[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)\r\n\r\n---------\r\n\r\nCo-authored-by:
kibanamachine
<[email protected]>","sha":"96fd4b682b77f6c1d6d1c6ab0742462d9e9d2589"}}]}]
BACKPORT-->

---------

Co-authored-by: Davis McPhee <[email protected]>
TattdCodeMonkey pushed a commit to TattdCodeMonkey/kibana that referenced this issue Nov 21, 2024
…a to hang (elastic#200476)

## Summary

This PR mitigates an issue where the `has_es_data` check can hang when
some remote clusters are unresponsive, leaving users stuck in a loading
state in some apps (e.g. Discover and Dashboard) until the request times
out. There are two main changes that help mitigate this issue:
- The `resolve/cluster` request in the `has_es_data` endpoint has been
split into two requests -- one for local data first, then another for
remote data second. In cases where remote clusters are unresponsive but
there is data available in the local cluster, the remote check is never
performed and the check completes quickly. This likely resolves the
majority of cases and is also likely faster in general than checking
both local and remote clusters in a single request.
- In cases where there is no local data and the remote `resolve/cluster`
request hangs, a new `data_views.hasEsDataTimeout` config has been added
to `kibana.yml` (defaults to 5 seconds) to abort the request after a
short delay. This scenario is handled in the front end by displaying an
error toast to the user informing them of the issue, and assuming there
is data available to avoid blocking them. When this occurs, a warning is
also logged to the Kibana server logs.

![CleanShot 2024-11-18 at 23 47
34@2x](https://github.com/user-attachments/assets/6ea14869-b6b6-4d89-a90c-8150d6e6b043)

Fixes elastic#200280.

### Notes
- Modifying the existing version of the `has_es_data` endpoint in this
way should be backward compatible since the behaviour should remain
unchanged from before when the client and server versions don't match
(please validate if this seems accurate during review).
- For a long term fix, the ES team is investigating the issue with
`resolve/cluster` and will aim to have it behave like `resolve/index`,
which fails quickly when remote clusters are unresponsive. They may also
implement other mitigations like a configurable timeout in ES:
elastic/elasticsearch#114020. The purpose of
this PR is to provide an immediate solution in Kibana that mitigates the
issue as much as possible.
- If ES ends up providing another performant method for checking if
indices exist instead of `resolve/cluster`, Kibana should migrate to
that. More details in
elastic/elasticsearch#112307.

### Testing notes

To reproduce the issue locally, follow these steps:
- Follow [these
instructions](https://gist.github.com/lukasolson/d0861aa3e6ee476ac8dd7189ed476756)
to set up a local CCS environment.
- Stop the remote cluster process.
- Use Netcat on the remote cluster port to listen to requests but not
respond (e.g. on macOS: `nc -l 9600`), simulating an unresponsive
cluster. See elastic/elasticsearch#32678 for
more context.
- Navigate to Discover and observe that the `has_es_data` request hangs.
When testing in this PR branch, the request will only wait for 5 seconds
before assuming data exists and displaying a toast.

### Checklist

- [x] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [x] This was checked for breaking HTTP API changes, and any breaking
changes have been approved by the breaking-change committee. The
`release_note:breaking` label should be applied in these situations.
- [ ] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_node:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

---------

Co-authored-by: kibanamachine <[email protected]>
paulinashakirova pushed a commit to paulinashakirova/kibana that referenced this issue Nov 26, 2024
…a to hang (elastic#200476)

## Summary

This PR mitigates an issue where the `has_es_data` check can hang when
some remote clusters are unresponsive, leaving users stuck in a loading
state in some apps (e.g. Discover and Dashboard) until the request times
out. There are two main changes that help mitigate this issue:
- The `resolve/cluster` request in the `has_es_data` endpoint has been
split into two requests -- one for local data first, then another for
remote data second. In cases where remote clusters are unresponsive but
there is data available in the local cluster, the remote check is never
performed and the check completes quickly. This likely resolves the
majority of cases and is also likely faster in general than checking
both local and remote clusters in a single request.
- In cases where there is no local data and the remote `resolve/cluster`
request hangs, a new `data_views.hasEsDataTimeout` config has been added
to `kibana.yml` (defaults to 5 seconds) to abort the request after a
short delay. This scenario is handled in the front end by displaying an
error toast to the user informing them of the issue, and assuming there
is data available to avoid blocking them. When this occurs, a warning is
also logged to the Kibana server logs.

![CleanShot 2024-11-18 at 23 47
34@2x](https://github.com/user-attachments/assets/6ea14869-b6b6-4d89-a90c-8150d6e6b043)

Fixes elastic#200280.

### Notes
- Modifying the existing version of the `has_es_data` endpoint in this
way should be backward compatible since the behaviour should remain
unchanged from before when the client and server versions don't match
(please validate if this seems accurate during review).
- For a long term fix, the ES team is investigating the issue with
`resolve/cluster` and will aim to have it behave like `resolve/index`,
which fails quickly when remote clusters are unresponsive. They may also
implement other mitigations like a configurable timeout in ES:
elastic/elasticsearch#114020. The purpose of
this PR is to provide an immediate solution in Kibana that mitigates the
issue as much as possible.
- If ES ends up providing another performant method for checking if
indices exist instead of `resolve/cluster`, Kibana should migrate to
that. More details in
elastic/elasticsearch#112307.

### Testing notes

To reproduce the issue locally, follow these steps:
- Follow [these
instructions](https://gist.github.com/lukasolson/d0861aa3e6ee476ac8dd7189ed476756)
to set up a local CCS environment.
- Stop the remote cluster process.
- Use Netcat on the remote cluster port to listen to requests but not
respond (e.g. on macOS: `nc -l 9600`), simulating an unresponsive
cluster. See elastic/elasticsearch#32678 for
more context.
- Navigate to Discover and observe that the `has_es_data` request hangs.
When testing in this PR branch, the request will only wait for 5 seconds
before assuming data exists and displaying a toast.

### Checklist

- [x] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [x] This was checked for breaking HTTP API changes, and any breaking
changes have been approved by the breaking-change committee. The
`release_note:breaking` label should be applied in these situations.
- [ ] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_node:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

---------

Co-authored-by: kibanamachine <[email protected]>
CAWilson94 pushed a commit to CAWilson94/kibana that referenced this issue Dec 12, 2024
…a to hang (elastic#200476)

## Summary

This PR mitigates an issue where the `has_es_data` check can hang when
some remote clusters are unresponsive, leaving users stuck in a loading
state in some apps (e.g. Discover and Dashboard) until the request times
out. There are two main changes that help mitigate this issue:
- The `resolve/cluster` request in the `has_es_data` endpoint has been
split into two requests -- one for local data first, then another for
remote data second. In cases where remote clusters are unresponsive but
there is data available in the local cluster, the remote check is never
performed and the check completes quickly. This likely resolves the
majority of cases and is also likely faster in general than checking
both local and remote clusters in a single request.
- In cases where there is no local data and the remote `resolve/cluster`
request hangs, a new `data_views.hasEsDataTimeout` config has been added
to `kibana.yml` (defaults to 5 seconds) to abort the request after a
short delay. This scenario is handled in the front end by displaying an
error toast to the user informing them of the issue, and assuming there
is data available to avoid blocking them. When this occurs, a warning is
also logged to the Kibana server logs.

![CleanShot 2024-11-18 at 23 47
34@2x](https://github.com/user-attachments/assets/6ea14869-b6b6-4d89-a90c-8150d6e6b043)

Fixes elastic#200280.

### Notes
- Modifying the existing version of the `has_es_data` endpoint in this
way should be backward compatible since the behaviour should remain
unchanged from before when the client and server versions don't match
(please validate if this seems accurate during review).
- For a long term fix, the ES team is investigating the issue with
`resolve/cluster` and will aim to have it behave like `resolve/index`,
which fails quickly when remote clusters are unresponsive. They may also
implement other mitigations like a configurable timeout in ES:
elastic/elasticsearch#114020. The purpose of
this PR is to provide an immediate solution in Kibana that mitigates the
issue as much as possible.
- If ES ends up providing another performant method for checking if
indices exist instead of `resolve/cluster`, Kibana should migrate to
that. More details in
elastic/elasticsearch#112307.

### Testing notes

To reproduce the issue locally, follow these steps:
- Follow [these
instructions](https://gist.github.com/lukasolson/d0861aa3e6ee476ac8dd7189ed476756)
to set up a local CCS environment.
- Stop the remote cluster process.
- Use Netcat on the remote cluster port to listen to requests but not
respond (e.g. on macOS: `nc -l 9600`), simulating an unresponsive
cluster. See elastic/elasticsearch#32678 for
more context.
- Navigate to Discover and observe that the `has_es_data` request hangs.
When testing in this PR branch, the request will only wait for 5 seconds
before assuming data exists and displaying a toast.

### Checklist

- [x] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [x] This was checked for breaking HTTP API changes, and any breaking
changes have been approved by the breaking-change committee. The
`release_note:breaking` label should be applied in these situations.
- [ ] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_node:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

---------

Co-authored-by: kibanamachine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Feature:Data Views Data Views code and UI - index patterns before 8.0 impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. Team:DataDiscovery Discover, search (e.g. data plugin and KQL), data views, saved searches. For ES|QL, use Team:ES|QL.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants