Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change skip_unavailable default value to true #105792

Merged
merged 20 commits into from
Apr 29, 2024
Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
0a3516e
Change skip_unavailable default value to true
quux00 Feb 23, 2024
05ca1e6
Update docs/changelog/105792.yaml
quux00 Feb 23, 2024
1c353c3
Filled in details and impact section of the yaml changelog
quux00 Feb 23, 2024
9fbbaf4
Intmd commit - revert me
quux00 Mar 4, 2024
8ebd627
Corrected breaking change changelog. Modified security/qa tests to pa…
quux00 Mar 5, 2024
2d7d868
Fixed several more tests. Not all passing yet.
quux00 Mar 5, 2024
3fb0896
Set my_remote_cluster skip_unavailable to false to let tests pass in …
quux00 Mar 6, 2024
7c5c718
Set skip_unavailable=false in qa/multi-cluster-search-security/legacy…
quux00 Mar 6, 2024
623b5be
Adjusted more tests to pass
quux00 Mar 6, 2024
57c645e
Updated end user API docs around skip_unavailable changes
quux00 Mar 7, 2024
154c606
reverted two small unintentional changes
quux00 Mar 7, 2024
520ec7b
Changed API docs to specify that the change occurs in 8.15, not 8.14
quux00 Mar 11, 2024
7d1692b
Merge remote-tracking branch 'elastic/main' into ccs/skip_unavailable…
quux00 Apr 22, 2024
9de832f
Adjusted RemoteClusterSecurityEsqlIT to handle the new skip_unavailab…
quux00 Apr 22, 2024
49b1200
Merge remote-tracking branch 'elastic/main' into ccs/skip_unavailable…
quux00 Apr 24, 2024
4235b14
Updated RemoteClusterSecurityEsqlIT to test both values of skip_unava…
quux00 Apr 24, 2024
16f54ee
Added skip_unavailable=true variants to various tests
quux00 Apr 24, 2024
5e27ec5
Fixed failing test in RemoteClusterClientTests
quux00 Apr 24, 2024
bc3eea5
Merge remote-tracking branch 'elastic/main' into ccs/skip_unavailable…
quux00 Apr 29, 2024
89dbd31
Adding migration info about skip_unavailable to remote-clusters-setti…
quux00 Apr 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions docs/changelog/105792.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
pr: 105792
summary: "Change `skip_unavailable` remote cluster setting default value to true"
area: Search
type: breaking
issues: []
breaking:
title: "Change `skip_unavailable` remote cluster setting default value to true"
area: Cluster and node setting
details: The default value of the `skip_unavailable` setting is now set to true.
All existing and future remote clusters that do not define this setting will use the new default.
This setting only affects cross-cluster searches using the _search or _async_search API.
impact: Unavailable remote clusters in a cross-cluster search will no longer cause the search to fail unless
skip_unavailable is configured to be `false` in elasticsearch.yml or via the `_cluster/settings` API.
Unavailable clusters with `skip_unavailable`=`true` (either explicitly or by using the new default) are marked
as SKIPPED in the search response metadata section and do not fail the entire search. If users want to ensure that a
search returns a failure when a particular remote cluster is not available, `skip_unavailable` must be now be
set explicitly.
notable: false
2 changes: 1 addition & 1 deletion docs/reference/ccr/getting-started.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@ cluster with cluster alias `leader`.
"num_nodes_connected" : 1, <1>
"max_connections_per_cluster" : 3,
"initial_connect_timeout" : "30s",
"skip_unavailable" : false,
"skip_unavailable" : true,
"mode" : "sniff"
}
}
Expand Down
18 changes: 9 additions & 9 deletions docs/reference/modules/cluster/remote-clusters-connect.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ clusters on individual nodes in the local cluster, define static settings in
`elasticsearch.yml` for each node.

The following request adds a remote cluster with an alias of `cluster_one`. This
_cluster alias_ is a unique identifier that represents the connection to the
_cluster alias_ is a unique identifier that represents the connection to the
remote cluster and is used to distinguish between local and remote indices.

[source,console,subs=attributes+]
Expand All @@ -60,7 +60,7 @@ PUT /_cluster/settings
// TEST[setup:host]
// TEST[s/127.0.0.1:\{remote-interface-default-port\}/\${transport_host}/]
<1> The cluster alias of this remote cluster is `cluster_one`.
<2> Specifies the hostname and {remote-interface} port of a seed node in the
<2> Specifies the hostname and {remote-interface} port of a seed node in the
remote cluster.

You can use the <<cluster-remote-info,remote cluster info API>> to verify that
Expand All @@ -86,7 +86,7 @@ cluster with the cluster alias `cluster_one`:
"num_nodes_connected" : 1, <1>
"max_connections_per_cluster" : 3,
"initial_connect_timeout" : "30s",
"skip_unavailable" : false, <2>
"skip_unavailable" : true, <2>
ifeval::["{trust-mechanism}"=="api-key"]
"cluster_credentials": "::es_redacted::", <3>
endif::[]
Expand All @@ -103,7 +103,7 @@ connected to.
<2> Indicates whether to skip the remote cluster if searched through {ccs} but
no nodes are available.
ifeval::["{trust-mechanism}"=="api-key"]
<3> If present, indicates the remote cluster has connected using API key
<3> If present, indicates the remote cluster has connected using API key
authentication.
endif::[]

Expand Down Expand Up @@ -187,7 +187,7 @@ PUT _cluster/settings

You can delete a remote cluster from the cluster settings by passing `null`
values for each remote cluster setting. The following request removes
`cluster_two` from the cluster settings, leaving `cluster_one` and
`cluster_two` from the cluster settings, leaving `cluster_one` and
`cluster_three` intact:

[source,console]
Expand All @@ -212,15 +212,15 @@ PUT _cluster/settings

===== Statically configure remote clusters
If you specify settings in `elasticsearch.yml`, only the nodes with
those settings can connect to the remote cluster and serve remote cluster
those settings can connect to the remote cluster and serve remote cluster
requests.

NOTE: Remote cluster settings that are specified using the
NOTE: Remote cluster settings that are specified using the
<<cluster-update-settings,cluster update settings API>> take precedence over
settings that you specify in `elasticsearch.yml` for individual nodes.

In the following example, `cluster_one`, `cluster_two`, and `cluster_three` are
arbitrary cluster aliases representing the connection to each cluster. These
In the following example, `cluster_one`, `cluster_two`, and `cluster_three` are
arbitrary cluster aliases representing the connection to each cluster. These
names are subsequently used to distinguish between local and remote indices.

[source,yaml,subs=attributes+]
Expand Down
11 changes: 8 additions & 3 deletions docs/reference/modules/cluster/remote-clusters-settings.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,14 @@ mode are described separately.

Per cluster boolean setting that allows to skip specific clusters when no
nodes belonging to them are available and they are the target of a remote
cluster request. Default is `false`, meaning that all clusters are mandatory
by default, but they can selectively be made optional by setting this setting
to `true`.
cluster request.

IMPORTANT: In Elasticsearch 8.15, the default value for `skip_unavailable` was
changed from `false` to `true`. Before Elasticsearch 8.15, if you want a cluster
to be treated as optional for a {ccs}, then you need to set that configuration.
From Elasticsearch 8.15 forward, you need to set the configuration in order to
make a cluster required for the {ccs}.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe obvious, but should we be explicit about the upgrade path: once you upgrade the coordinating node / cluster, where the remotes are registered, you get the new behavior? Say that the coord cluster has multiple nodes that are upgraded one by one (rolling upgrade) what would be the behavior there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, but that is very specific to a one-time upgrade, so should that go in the "permanent" API docs? Or should it go into some one-time doc like the changelog or deprecation announcement?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally we have migration guides for major upgrades and related changes. This is one of those changes but it goes out in a minor, I don't know if we a place for this scenario in the docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a blurb here about that and we'll also include in the 8.15 release notes. Thanks for flagging this.



`cluster.remote.<cluster_alias>.transport.ping_schedule`::

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1178,7 +1178,13 @@ gathered from all 3 clusters and the total shard count on each cluster is listed
By default, a {ccs} fails if a remote cluster in the request is unavailable
or returns an error where the search on all shards failed. Use the
`skip_unavailable` cluster setting to mark a specific remote cluster as
optional for {ccs}.
either optional or required for {ccs}.

IMPORTANT: In Elasticsearch 8.15, the default value for `skip_unavailable` was
changed from `false` to `true`. Before Elasticsearch 8.15, if you want a cluster
to be treated as optional for a {ccs}, then you need to set that configuration.
From Elasticsearch 8.15 forward, you need to set the configuration in order to
make a cluster required for the {ccs}.

If `skip_unavailable` is `true`, a {ccs}:

Expand All @@ -1196,25 +1202,33 @@ parameter and the related `search.default_allow_partial_results` cluster setting
when searching the remote cluster. This means searches on the remote cluster may
return partial results.

The following <<cluster-update-settings,cluster update settings>>
API request changes `skip_unavailable` setting to `true` for `cluster_two`.

[source,console]
--------------------------------
PUT _cluster/settings
{
"persistent": {
"cluster.remote.cluster_two.skip_unavailable": true
}
}
--------------------------------
// TEST[continued]

If `cluster_two` is disconnected or unavailable during a {ccs}, {es} won't
include matching documents from that cluster in the final results. If at
least one shard provides results, those results will be used and the
search will return partial data. (If doing {ccs} using async search,
the `is_partial` field will be set to `true` to indicate partial results.)
You can modify the `skip_unavailable` setting by editing the `cluster.remote.<cluster_alias>`
settings in the elasticsearch.yml config file. For example:

```
cluster:
remote:
cluster_one:
seeds: 35.238.149.1:9300
skip_unavailable: false
cluster_two:
seeds: 35.238.149.2:9300
skip_unavailable: true
```

Or you can set the cluster.remote settings via the
<<cluster-update-settings,cluster update settings>> API as shown
<<ccs-remote-cluster-setup, here>>.

When a remote cluster configured with `skip_unavailable: true` (such as
`cluster_two` above) is disconnected or unavailable during a {ccs}, {es} won't
include matching documents from that cluster in the final results and the
search will be considered successful (HTTP status 200 OK).

If at least one shard from a cluster provides search results, those results will
be used and the search will return partial data. This is true regardless of
the `skip_unavailable` setting of the remote cluster. (If doing {ccs} using async
search, the `is_partial` field will be set to `true` to indicate partial results.)

[discrete]
[[ccs-network-delays]]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@

import java.util.Collection;
import java.util.List;
import java.util.Map;

import static org.elasticsearch.test.hamcrest.ElasticsearchAssertions.assertAcked;
import static org.hamcrest.Matchers.containsString;
Expand All @@ -38,6 +39,11 @@ protected Collection<String> remoteClusterAlias() {
return List.of(REMOTE_CLUSTER);
}

@Override
protected Map<String, Boolean> skipUnavailableForRemoteClusters() {
return Map.of(REMOTE_CLUSTER, false);
}

@Override
protected Collection<Class<? extends Plugin>> nodePlugins(String clusterAlias) {
return List.of(ReindexPlugin.class);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,7 @@ public class CcsCommonYamlTestSuiteIT extends ESClientYamlSuiteTestCase {
.setting("node.roles", "[data,ingest,master,remote_cluster_client]")
.setting("cluster.remote.remote_cluster.seeds", () -> "\"" + remoteCluster.getTransportEndpoint(0) + "\"")
.setting("cluster.remote.connections_per_cluster", "1")
.setting("cluster.remote.remote_cluster.skip_unavailable", "false")
.apply(commonClusterConfig)
.build();

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -246,6 +246,7 @@ public void initSearchClient() throws IOException {

private static void configureRemoteCluster() throws IOException {
final Settings.Builder builder = Settings.builder();
builder.put("cluster.remote." + REMOTE_CLUSTER_NAME + ".skip_unavailable", "false");
if (randomBoolean()) {
builder.put("cluster.remote." + REMOTE_CLUSTER_NAME + ".mode", "proxy")
.put("cluster.remote." + REMOTE_CLUSTER_NAME + ".proxy_address", fulfillingCluster.getRemoteClusterServerEndpoint(0));
Expand Down
1 change: 1 addition & 0 deletions qa/multi-cluster-search/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ BuildParams.bwcVersions.withWireCompatible(ccsSupportedVersion) { bwcVersion, ba
setting 'cluster.remote.connections_per_cluster', '1'
setting 'cluster.remote.my_remote_cluster.seeds',
{ "\"${remoteCluster.get().getAllTransportPortURI().get(0)}\"" }
setting 'cluster.remote.my_remote_cluster.skip_unavailable', 'false'
}

tasks.register("${baseName}#remote-cluster", RestIntegTestTask) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -249,7 +249,7 @@
persistent:
cluster.remote.test_remote_cluster.seeds: $remote_ip

- match: {persistent: {cluster.remote.test_remote_cluster.seeds: $remote_ip}}
- match: {persistent.cluster\.remote\.test_remote_cluster\.seeds: $remote_ip}

- do:
search:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -113,33 +113,33 @@

- do:
cluster.remote_info: {}
- is_false: remote1.skip_unavailable
- is_true: remote1.skip_unavailable

- do:
cluster.put_settings:
body:
persistent:
cluster.remote.remote1.skip_unavailable: true
cluster.remote.remote1.skip_unavailable: false

- is_true: persistent.cluster.remote.remote1.skip_unavailable
- is_false: persistent.cluster.remote.remote1.skip_unavailable

- do:
cluster.remote_info: {}

- is_true: remote1.skip_unavailable
- is_false: remote1.skip_unavailable

- do:
cluster.put_settings:
body:
persistent:
cluster.remote.remote1.skip_unavailable: false
cluster.remote.remote1.skip_unavailable: true

- is_false: persistent.cluster.remote.remote1.skip_unavailable
- is_true: persistent.cluster.remote.remote1.skip_unavailable

- do:
cluster.remote_info: {}

- is_false: remote1.skip_unavailable
- is_true: remote1.skip_unavailable

- do:
cluster.put_settings:
Expand All @@ -152,7 +152,7 @@
- do:
cluster.remote_info: {}

- is_false: remote1.skip_unavailable
- is_true: remote1.skip_unavailable
quux00 marked this conversation as resolved.
Show resolved Hide resolved

- do:
cluster.put_settings:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ public final class RemoteClusterService extends RemoteClusterAware implements Cl
public static final Setting.AffixSetting<Boolean> REMOTE_CLUSTER_SKIP_UNAVAILABLE = Setting.affixKeySetting(
"cluster.remote.",
"skip_unavailable",
(ns, key) -> boolSetting(key, false, new RemoteConnectionEnabled<>(ns, key), Setting.Property.Dynamic, Setting.Property.NodeScope)
(ns, key) -> boolSetting(key, true, new RemoteConnectionEnabled<>(ns, key), Setting.Property.Dynamic, Setting.Property.NodeScope)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only production code change. Everything else is tests and docs.

);

public static final Setting.AffixSetting<TimeValue> REMOTE_CLUSTER_PING_SCHEDULE = Setting.affixKeySetting(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -469,7 +469,8 @@ private MockTransportService[] startTransport(
int numClusters,
DiscoveryNode[] nodes,
Map<String, OriginalIndices> remoteIndices,
Settings.Builder settingsBuilder
Settings.Builder settingsBuilder,
boolean skipUnavailable
) {
MockTransportService[] mockTransportServices = new MockTransportService[numClusters];
for (int i = 0; i < numClusters; i++) {
Expand All @@ -486,6 +487,7 @@ private MockTransportService[] startTransport(
knownNodes.add(remoteSeedNode);
nodes[i] = remoteSeedNode;
settingsBuilder.put("cluster.remote.remote" + i + ".seeds", remoteSeedNode.getAddress().toString());
settingsBuilder.put("cluster.remote.remote" + i + ".skip_unavailable", Boolean.toString(skipUnavailable));
remoteIndices.put("remote" + i, new OriginalIndices(new String[] { "index" }, IndicesOptions.lenientExpandOpen()));
}
return mockTransportServices;
Expand All @@ -496,7 +498,8 @@ public void testCCSRemoteReduceMergeFails() throws Exception {
DiscoveryNode[] nodes = new DiscoveryNode[numClusters];
Map<String, OriginalIndices> remoteIndicesByCluster = new HashMap<>();
Settings.Builder builder = Settings.builder();
MockTransportService[] mockTransportServices = startTransport(numClusters, nodes, remoteIndicesByCluster, builder);
boolean skipUnavailable = randomBoolean();
MockTransportService[] mockTransportServices = startTransport(numClusters, nodes, remoteIndicesByCluster, builder, skipUnavailable);
Settings settings = builder.build();
boolean local = randomBoolean();
OriginalIndices localIndices = local ? new OriginalIndices(new String[] { "index" }, SearchRequest.DEFAULT_INDICES_OPTIONS) : null;
Expand Down Expand Up @@ -566,7 +569,8 @@ public void testCCSRemoteReduce() throws Exception {
DiscoveryNode[] nodes = new DiscoveryNode[numClusters];
Map<String, OriginalIndices> remoteIndicesByCluster = new HashMap<>();
Settings.Builder builder = Settings.builder();
MockTransportService[] mockTransportServices = startTransport(numClusters, nodes, remoteIndicesByCluster, builder);
boolean skipUnavailable = randomBoolean();
MockTransportService[] mockTransportServices = startTransport(numClusters, nodes, remoteIndicesByCluster, builder, skipUnavailable);
Settings settings = builder.build();
boolean local = randomBoolean();
OriginalIndices localIndices = local ? new OriginalIndices(new String[] { "index" }, SearchRequest.DEFAULT_INDICES_OPTIONS) : null;
Expand Down Expand Up @@ -709,7 +713,8 @@ public void testCCSRemoteReduceWhereRemoteClustersFail() throws Exception {
DiscoveryNode[] nodes = new DiscoveryNode[numClusters];
Map<String, OriginalIndices> remoteIndicesByCluster = new HashMap<>();
Settings.Builder builder = Settings.builder();
MockTransportService[] mockTransportServices = startTransport(numClusters, nodes, remoteIndicesByCluster, builder);
boolean skipUnavailable = randomBoolean();
MockTransportService[] mockTransportServices = startTransport(numClusters, nodes, remoteIndicesByCluster, builder, skipUnavailable);
Settings settings = builder.build();
boolean local = randomBoolean();
OriginalIndices localIndices = local ? new OriginalIndices(new String[] { "index" }, SearchRequest.DEFAULT_INDICES_OPTIONS) : null;
Expand All @@ -734,10 +739,13 @@ public void testCCSRemoteReduceWhereRemoteClustersFail() throws Exception {
final CountDownLatch latch = new CountDownLatch(1);
SetOnce<Tuple<SearchRequest, ActionListener<SearchResponse>>> setOnce = new SetOnce<>();
AtomicReference<Exception> failure = new AtomicReference<>();
LatchedActionListener<SearchResponse> listener = new LatchedActionListener<>(
ActionListener.wrap(r -> fail("no response expected"), failure::set),
latch
);
LatchedActionListener<SearchResponse> listener = new LatchedActionListener<>(ActionListener.wrap(r -> {
if (skipUnavailable) {
assertThat(r.getClusters().getClusterStateCount(SearchResponse.Cluster.Status.SKIPPED), equalTo(numClusters));
} else {
fail("no response expected"); // failure should be returned, not SearchResponse
}
}, failure::set), latch);

TaskId parentTaskId = new TaskId("n", 1);
SearchTask task = new SearchTask(2, "search", "search", () -> "desc", parentTaskId, Collections.emptyMap());
Expand All @@ -763,10 +771,14 @@ public void testCCSRemoteReduceWhereRemoteClustersFail() throws Exception {
resolveWithEmptySearchResponse(tuple);
}
awaitLatch(latch, 5, TimeUnit.SECONDS);
assertNotNull(failure.get());
assertThat(failure.get(), instanceOf(RemoteTransportException.class));
RemoteTransportException remoteTransportException = (RemoteTransportException) failure.get();
assertEquals(RestStatus.NOT_FOUND, remoteTransportException.status());
if (skipUnavailable) {
assertNull(failure.get());
} else {
assertNotNull(failure.get());
assertThat(failure.get(), instanceOf(RemoteTransportException.class));
RemoteTransportException remoteTransportException = (RemoteTransportException) failure.get();
assertEquals(RestStatus.NOT_FOUND, remoteTransportException.status());
}
}

} finally {
Expand All @@ -781,7 +793,7 @@ public void testCCSRemoteReduceWithDisconnectedRemoteClusters() throws Exception
DiscoveryNode[] nodes = new DiscoveryNode[numClusters];
Map<String, OriginalIndices> remoteIndicesByCluster = new HashMap<>();
Settings.Builder builder = Settings.builder();
MockTransportService[] mockTransportServices = startTransport(numClusters, nodes, remoteIndicesByCluster, builder);
MockTransportService[] mockTransportServices = startTransport(numClusters, nodes, remoteIndicesByCluster, builder, false);
Settings settings = builder.build();
boolean local = randomBoolean();
OriginalIndices localIndices = local ? new OriginalIndices(new String[] { "index" }, SearchRequest.DEFAULT_INDICES_OPTIONS) : null;
Expand Down Expand Up @@ -1035,7 +1047,7 @@ public void testCollectSearchShards() throws Exception {
DiscoveryNode[] nodes = new DiscoveryNode[numClusters];
Map<String, OriginalIndices> remoteIndicesByCluster = new HashMap<>();
Settings.Builder builder = Settings.builder();
MockTransportService[] mockTransportServices = startTransport(numClusters, nodes, remoteIndicesByCluster, builder);
MockTransportService[] mockTransportServices = startTransport(numClusters, nodes, remoteIndicesByCluster, builder, false);
Settings settings = builder.build();
try (
MockTransportService service = MockTransportService.createNewService(
Expand Down
Loading