Change skip_unavailable default value to true #105792

quux00 · 2024-02-23T17:11:24Z

In order to improve the experience of cross-cluster search, we propose changing
the default value of the remote cluster skip_unavailable setting from false to true.

This setting causes any cross-cluster search to entirely fail when any remote cluster
with skip_unavailable=false is either unavailable (connection to it fails) or if the
search on it fails on all shards.

Setting skip_unavailable=true allows partial results from other clusters to be
returned. In that case, the search response cluster metadata will show a skipped
status, so the user can see that no data came in from that cluster. Kibana also
now leverages this metadata in the cross-cluster search responses to allow users
to see how many clusters returned data and drill down into which clusters did not
(including failure messages).

Currently, the user/admin has to specifically set the value to true in the configs, like so:

cluster:
    remote:
        remote1:
            seeds: 10.10.10.10:9300
            skip_unavailable: true

even though that is probably what search admins want in the vast majority of cases.

Setting skip_unavailable=false should be a conscious (and probably rare) choice
by an Elasticsearch admin that a particular cluster's results are so essential to a
search (or visualization in dashboard or Discover panel) that no results at all should
be shown if it cannot return any results.

elasticsearchmachine · 2024-02-23T17:11:48Z

Hi @quux00, I've created a changelog YAML for you. Note that since this PR is labelled >breaking, you need to update the changelog YAML to fill out the extended information sections.

quux00 · 2024-03-07T16:36:16Z

In the process of doing this work (mostly fixing tests), I found some potential issues, both of which are pre-existing but now perhaps become more acute, so they may need to be addressed.

(1) For _search (and _async_search), when skip_unavailable=false and a cluster has an issue (such as a Security/permissions exception or IndexNotFound), that error is explicitly returned, along with an error status code such as 404 or 403. As expected, when skip_unavailable=true and the same issue occurs, an HTTP status of 200 is returned, but the error reported in the SearchResponse is NoSuchRemoteClusterException, not the IndexNotFoundException. This feels like a bug in the cluster resolution logic (es-distributed side?).

(2) The reindex API is also affected by the skip_unavailable when doing a remote reindexing operation. When skip_unavailable=false and a reindex targets an index that does not exist, then IndexNotFoundException is returned with an HTTP status error code.

However, when skip_unavailable=true it does not return an error status code, and worse it does not report any error in the response object:

{
    "took": 36,
    "timed_out": false,
    "total": 0,
    "updated": 0,
    "created": 0,
    "deleted": 0,
    "batches": 0,
    "version_conflicts": 0,
    "noops": 0,
    "retries": {
        "bulk": 0,
        "search": 0
    },
    "throttled_millis": 0,
    "requests_per_second": -1,
    "throttled_until_millis": 0,
    "failures": []
}

Should the reindex API report an issue in the failures section of this response in this case? (It likely needs to parse the search response against the remote cluster.) Or should it return an error status code? Should the reindex API have a dependency on the skip_unavailable setting?

elasticsearchmachine · 2024-03-07T16:36:28Z

Pinging @elastic/es-search (Team:Search)

quux00 · 2024-03-07T16:38:05Z

server/src/main/java/org/elasticsearch/transport/RemoteClusterService.java

@@ -87,7 +87,7 @@ public final class RemoteClusterService extends RemoteClusterAware implements Cl
    public static final Setting.AffixSetting<Boolean> REMOTE_CLUSTER_SKIP_UNAVAILABLE = Setting.affixKeySetting(
        "cluster.remote.",
        "skip_unavailable",
-        (ns, key) -> boolSetting(key, false, new RemoteConnectionEnabled<>(ns, key), Setting.Property.Dynamic, Setting.Property.NodeScope)
+        (ns, key) -> boolSetting(key, true, new RemoteConnectionEnabled<>(ns, key), Setting.Property.Dynamic, Setting.Property.NodeScope)


This is the only production code change. Everything else is tests and docs.

…ss. Multi_cluster rest-api-spec tests still failing

…qa/multi-cluster-search gradle

…-with-basic-license/build.gradle to allow tests to pass

…-defaults-to-true

…le default

…-defaults-to-true

…ilable

quux00 · 2024-04-24T20:24:37Z

@elasticmachine run elasticsearch-ci/part-1

javanna

I left a question about yaml tests and the upgrade path, LGTM otherwise

qa/multi-cluster-search/src/test/resources/rest-api-spec/test/multi_cluster/20_info.yml

javanna · 2024-04-25T08:13:24Z

docs/reference/modules/cluster/remote-clusters-settings.asciidoc

+changed from `false` to `true`. Before Elasticsearch 8.15, if you want a cluster
+to be treated as optional for a {ccs}, then you need to set that configuration.
+From Elasticsearch 8.15 forward, you need to set the configuration in order to
+make a cluster required for the {ccs}.


Maybe obvious, but should we be explicit about the upgrade path: once you upgrade the coordinating node / cluster, where the remotes are registered, you get the new behavior? Say that the coord cluster has multiple nodes that are upgraded one by one (rolling upgrade) what would be the behavior there?

Good point, but that is very specific to a one-time upgrade, so should that go in the "permanent" API docs? Or should it go into some one-time doc like the changelog or deprecation announcement?

Normally we have migration guides for major upgrades and related changes. This is one of those changes but it goes out in a minor, I don't know if we a place for this scenario in the docs.

I added a blurb here about that and we'll also include in the 8.15 release notes. Thanks for flagging this.

quux00 · 2024-04-25T12:57:38Z

@elasticmachine test this please

…-defaults-to-true

…ngs.asciidoc

quux00 · 2024-04-29T18:09:27Z

@elasticmachine run elasticsearch-ci/part-2

quux00 added >breaking :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team v8.14.0 labels Feb 23, 2024

quux00 force-pushed the ccs/skip_unavailable-defaults-to-true branch 2 times, most recently from e634836 to 67c4f7c Compare March 6, 2024 16:12

quux00 marked this pull request as ready for review March 7, 2024 16:36

quux00 requested a review from a team as a code owner March 7, 2024 16:36

quux00 commented Mar 7, 2024

View reviewed changes

quux00 added 12 commits March 11, 2024 09:04

Change skip_unavailable default value to true

0a3516e

Update docs/changelog/105792.yaml

05ca1e6

Filled in details and impact section of the yaml changelog

1c353c3

Intmd commit - revert me

9fbbaf4

Corrected breaking change changelog. Modified security/qa tests to pa…

8ebd627

…ss. Multi_cluster rest-api-spec tests still failing

Fixed several more tests. Not all passing yet.

2d7d868

Set my_remote_cluster skip_unavailable to false to let tests pass in …

3fb0896

…qa/multi-cluster-search gradle

Set skip_unavailable=false in qa/multi-cluster-search-security/legacy…

7c5c718

…-with-basic-license/build.gradle to allow tests to pass

Adjusted more tests to pass

623b5be

Updated end user API docs around skip_unavailable changes

57c645e

reverted two small unintentional changes

154c606

Changed API docs to specify that the change occurs in 8.15, not 8.14

520ec7b

quux00 force-pushed the ccs/skip_unavailable-defaults-to-true branch from 95edc6c to 520ec7b Compare March 11, 2024 13:08

quux00 mentioned this pull request Mar 13, 2024

Reindex API reports no errors when it fails against a cluster with skip_unavailable=true #106331

Open

elasticsearchmachine added v8.15.0 and removed v8.14.0 labels Apr 17, 2024

quux00 added 2 commits April 22, 2024 11:29

Merge remote-tracking branch 'elastic/main' into ccs/skip_unavailable…

7d1692b

…-defaults-to-true

Adjusted RemoteClusterSecurityEsqlIT to handle the new skip_unavailab…

9de832f

…le default

quux00 added 4 commits April 24, 2024 09:08

Merge remote-tracking branch 'elastic/main' into ccs/skip_unavailable…

49b1200

…-defaults-to-true

Updated RemoteClusterSecurityEsqlIT to test both values of skip_unava…

4235b14

…ilable

Added skip_unavailable=true variants to various tests

16f54ee

Fixed failing test in RemoteClusterClientTests

5e27ec5

mark-vieira approved these changes Apr 24, 2024

View reviewed changes

javanna approved these changes Apr 25, 2024

View reviewed changes

quux00 added 2 commits April 29, 2024 10:43

Merge remote-tracking branch 'elastic/main' into ccs/skip_unavailable…

bc3eea5

…-defaults-to-true

Adding migration info about skip_unavailable to remote-clusters-setti…

89dbd31

…ngs.asciidoc

quux00 merged commit a451511 into elastic:main Apr 29, 2024
14 checks passed

alisonelizabeth mentioned this pull request May 8, 2024

[Remote clusters] Enable "skip if unavailable" by default elastic/kibana#181205

Closed

mhl-b mentioned this pull request Jul 4, 2024

skip_unavailable changes from true to false when remote connection fails #107125

Closed

Philippus mentioned this pull request Aug 17, 2024

Update elasticsearch docker image to 8.15.0 Philippus/elastic4s#3132

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change skip_unavailable default value to true #105792

Change skip_unavailable default value to true #105792

quux00 commented Feb 23, 2024 •

edited

Loading

elasticsearchmachine commented Feb 23, 2024

quux00 commented Mar 7, 2024 •

edited

Loading

elasticsearchmachine commented Mar 7, 2024

quux00 Mar 7, 2024

quux00 commented Apr 24, 2024

javanna left a comment

javanna Apr 25, 2024

quux00 Apr 25, 2024

javanna Apr 25, 2024

quux00 Apr 29, 2024

quux00 commented Apr 25, 2024

quux00 commented Apr 29, 2024

Change skip_unavailable default value to true #105792

Change skip_unavailable default value to true #105792

Conversation

quux00 commented Feb 23, 2024 • edited Loading

elasticsearchmachine commented Feb 23, 2024

quux00 commented Mar 7, 2024 • edited Loading

elasticsearchmachine commented Mar 7, 2024

quux00 Mar 7, 2024

Choose a reason for hiding this comment

quux00 commented Apr 24, 2024

javanna left a comment

Choose a reason for hiding this comment

javanna Apr 25, 2024

Choose a reason for hiding this comment

quux00 Apr 25, 2024

Choose a reason for hiding this comment

javanna Apr 25, 2024

Choose a reason for hiding this comment

quux00 Apr 29, 2024

Choose a reason for hiding this comment

quux00 commented Apr 25, 2024

quux00 commented Apr 29, 2024

quux00 commented Feb 23, 2024 •

edited

Loading

quux00 commented Mar 7, 2024 •

edited

Loading