Adapt the Cluster Health and Cat Indices APIs to closed indices #39364

tlrx · 2019-02-25T15:36:51Z

Note: this PR will be merged in the replicated-closed-indices feature branch

This pull request adapts the Cluster Health API to support replicated closed indices. In order to do that, this pull request removes the hard coded indices options from the ClusterHealthRequest and replaces it with a new IndicesOptions.lenientExpand() option. This option will be used by the master node (once it is upgraded to 8.0) to compute the global cluster health using both opened and closed indices information by default. The expand_wildcards REST parameter is also documented and tests where added to ensure that a specific expansion type can be used to monitoring the health of a only opened or only closed indices.

Since the Cat Indices relies on the Cluster Health API, it has been adapted to report information about closed indices too. Note that the health and number of shards/replicas is only printed out for closed indices that have an index routing table. Closed indices without routing table have the same output as before.

Related to #33888

elasticmachine · 2019-02-25T15:36:53Z

Pinging @elastic/es-distributed

tlrx · 2019-02-25T15:38:36Z

rest-api-spec/src/main/resources/rest-api-spec/api/cluster.health.json

@@ -12,6 +12,12 @@
        }
      },
      "params": {
+        "expand_wildcards": {


The REST API supports other indices options like ignore_unavailable but those options are overridden in the TransportClusterHealthAction to report a RED status is an index is missing (or just ignore it).

This is why this sole option is documented.

tlrx · 2019-02-25T15:39:45Z

server/src/main/java/org/elasticsearch/action/admin/cluster/health/ClusterHealthRequest.java

@@ -83,6 +84,11 @@ public ClusterHealthRequest(StreamInput in) throws IOException {
        if (in.getVersion().onOrAfter(Version.V_6_2_0)) {
            waitForNoInitializingShards = in.readBoolean();
        }
+        if (in.getVersion().onOrAfter(Version.V_8_0_0)) {


This will be changed to 7.0 when backporting this change

tlrx · 2019-02-25T19:44:33Z

Unrelated test failure on elasticsearch-ci/1 because of #39362, let's run the job again:
@elasticmachine run elasticsearch-ci/1

ywelsch · 2019-02-26T12:18:31Z

server/src/main/java/org/elasticsearch/rest/action/cat/RestIndicesAction.java

                // empty stats for closed indices, but their names are displayed
-                assert indexStats == null;


do closed indices now give an indexStats response?

Only in the context of the Cat Indices API which works by executing a Cluster State request, then a Cluster Health request and finally an Indices Stats request. All requests now share the same indices options (strictExpand) which expand to opened and closed indices and does not forbid closed indices.

The Indices Stats API uses the strictExpandOpenAndForbidClosed indices option by default and it will forbids closed indices to be included in stats, unless the Java API is used to pass a different indices options that allow them - like the Cat Indices API does.

Note that stats for closed indices are almost always 0, except for docs stats that are initialized in the read only engine ctor.

ywelsch · 2019-02-26T12:21:18Z

server/src/main/java/org/elasticsearch/rest/action/cat/RestIndicesAction.java

+            } else if (indexStats != null) {
+                health = "red*";
+            }
+            table.addCell(health);


what if health is null here? How will that render?

It renders as an empty cell - there are many other cells like this few lines after this change

tlrx · 2019-02-26T13:52:26Z

Thanks @ywelsch

henningandersen

Looking good already. I left a few comments to either clarify or enhance.

You probably have it on the plan already, but adding a test to verify cluster/index health during a mixed cluster scenario would be good I think, both with existing closed indices and indices being closed while in a mixed cluster state and finally verifying that after the cluster is fully upgraded, closed indices start being replicated (I think?).

henningandersen · 2019-02-26T19:01:02Z

server/src/test/java/org/elasticsearch/cluster/ClusterHealthIT.java

@@ -76,6 +82,159 @@ public void testHealth() {
        assertThat(healthResponse.getIndices().size(), equalTo(1));
    }

+    public void testHealthWithClosedIndices() {


There are no tests that verifies that we get a RED status if primary and replicas of a closed (or open) index are all gone. Would it be possible to add this?

henningandersen · 2019-02-26T19:07:01Z

server/src/main/java/org/elasticsearch/rest/action/cat/RestIndicesAction.java

@@ -383,34 +383,37 @@ protected Table getTableWithHeader(final RestRequest request) {
    }

    // package private for testing
-    Table buildTable(RestRequest request, IndexMetaData[] indicesMetaData, ClusterHealthResponse response, IndicesStatsResponse stats) {
+    Table buildTable(final RestRequest request,


I think the changes here should be accompanied by a change to RestIndicesActionTests to also test buildTable with closed indices?

henningandersen · 2019-02-26T20:37:57Z

server/src/main/java/org/elasticsearch/rest/action/cat/RestIndicesAction.java

+                    skip = indexHealth.getStatus() != healthStatusFilter;
+                } else {
+                    // index health is unknown, skip if we don't explicitly request RED health or if the index is closed but not replicated
+                    skip = ClusterHealthStatus.RED != healthStatusFilter || indexState == IndexMetaData.State.CLOSE;


Should we not treat open and closed indices identically for a totally upgraded cluster? I think there are two cases:

The index was deleted between the call to cluster state and the indices health call.

The user does not have access to health information (?)

In both cases, I think that on an upgraded system closed and open indices should be treated identically here? I think we may want this in a mixed cluster only?

henningandersen · 2019-02-27T07:49:48Z

server/src/main/java/org/elasticsearch/rest/action/cat/RestIndicesAction.java

                    continue;
                }
            }

            // the open index is present in the cluster state but is not returned in the indices stats API
-            if (indexStats == null && state != IndexMetaData.State.CLOSE) {
+            if (indexStats == null && indexState != IndexMetaData.State.CLOSE) {


Will this not let unauthorized users able to see close indices that they are unauthorized to see?

I think the stats API will now return closed indices on a fully upgraded system after the changes in #38024?

henningandersen · 2019-02-27T08:14:12Z

rest-api-spec/src/main/resources/rest-api-spec/api/cluster.health.json

@@ -12,6 +12,12 @@
        }
      },
      "params": {
+        "expand_wildcards": {


I think expand_wildcards should be added to _cat/indices too for symmetry and to be able to drill down into cluster health status using same expand_wildcards and index parameters?

henningandersen · 2019-02-27T10:10:11Z

rest-api-spec/src/main/resources/rest-api-spec/test/cluster.health/10_basic.yml

+  - do:
+      cluster.health:
+        wait_for_status: yellow
+        wait_for_no_relocating_shards: true


Also wait_for_no_initializing-shards? Seems we do so in ClusterHealthIT, so thought this should be the same?

Before this change, closed indexes were simply not replicated. It was therefore possible to close an index and then decommission a data node without knowing that this data node contained shards of the closed index, potentially leading to data loss. Shards of closed indices were not completely taken into account when balancing the shards within the cluster, or automatically replicated through shard copies, and they were not easily movable from node A to node B using APIs like Cluster Reroute without being fully reopened and closed again. This commit changes the logic executed when closing an index, so that its shards are not just removed and forgotten but are instead reinitialized and reallocated on data nodes using an engine implementation which does not allow searching or indexing, which has a low memory overhead (compared with searchable/indexable opened shards) and which allows shards to be recovered from peer or promoted as primaries when needed. This new closing logic is built on top of the new Close Index API introduced in 6.7.0 (#37359). Some pre-closing sanity checks are executed on the shards before closing them, and closing an index on a 8.0 cluster will reinitialize the index shards and therefore impact the cluster health. Some APIs have been adapted to make them work with closed indices: - Cluster Health API - Cluster Reroute API - Cluster Allocation Explain API - Recovery API - Cat Indices - Cat Shards - Cat Health - Cat Recovery This commit contains all the following changes (most recent first): * c6c42a1 Adapt NoOpEngineTests after #39006 * 3f9993d Wait for shards to be active after closing indices (#38854) * 5e7a428 Adapt the Cluster Health API to closed indices (#39364) * 3e61939 Adapt CloseFollowerIndexIT for replicated closed indices (#38767) * 71f5c34 Recover closed indices after a full cluster restart (#39249) * 4db7fd9 Adapt the Recovery API for closed indices (#38421) * 4fd1bb2 Adapt more tests suites to closed indices (#39186) * 0519016 Add replica to primary promotion test for closed indices (#39110) * b756f6c Test the Cluster Shard Allocation Explain API with closed indices (#38631) * c484c66 Remove index routing table of closed indices in mixed versions clusters (#38955) * 00f1828 Mute CloseFollowerIndexIT.testCloseAndReopenFollowerIndex() * e845b0a Do not schedule Refresh/Translog/GlobalCheckpoint tasks for closed indices (#38329) * cf9a015 Adapt testIndexCanChangeCustomDataPath for replicated closed indices (#38327) * b9becdd Adapt testPendingTasks() for replicated closed indices (#38326) * 02cc730 Allow shards of closed indices to be replicated as regular shards (#38024) * e53a9be Fix compilation error in IndexShardIT after merge with master * cae4155 Relax NoOpEngine constraints (#37413) * 54d110b [RCI] Adapt NoOpEngine to latest FrozenEngine changes * c63fd69 [RCI] Add NoOpEngine for closed indices (#33903) Relates to #33888

This commit adapts the Cluster Health API to support replicated closed indices. In order to do that, it removes the hard coded indices options from the `ClusterHealthRequest` and replaces it with a new `IndicesOptions.lenientExpand()` option. This option will be used by the master node (once it is upgraded to 8.0) to compute the global cluster health using both opened and closed indices information by default. The `expand_wildcards` REST parameter is also documented and tests where added to ensure that a specific expansion type can be used to monitoring the health of a only opened or only closed indices. Since the Cat Indices relies on the Cluster Health API, it has been adapted to report information about closed indices too. Note that the health and number of shards/replicas is only printed out for closed indices that have an index routing table. Closed indices without routing table have the same output as before. Related to elastic#33888

Backport support for replicating closed indices (#39499) Before this change, closed indexes were simply not replicated. It was therefore possible to close an index and then decommission a data node without knowing that this data node contained shards of the closed index, potentially leading to data loss. Shards of closed indices were not completely taken into account when balancing the shards within the cluster, or automatically replicated through shard copies, and they were not easily movable from node A to node B using APIs like Cluster Reroute without being fully reopened and closed again. This commit changes the logic executed when closing an index, so that its shards are not just removed and forgotten but are instead reinitialized and reallocated on data nodes using an engine implementation which does not allow searching or indexing, which has a low memory overhead (compared with searchable/indexable opened shards) and which allows shards to be recovered from peer or promoted as primaries when needed. This new closing logic is built on top of the new Close Index API introduced in 6.7.0 (#37359). Some pre-closing sanity checks are executed on the shards before closing them, and closing an index on a 8.0 cluster will reinitialize the index shards and therefore impact the cluster health. Some APIs have been adapted to make them work with closed indices: - Cluster Health API - Cluster Reroute API - Cluster Allocation Explain API - Recovery API - Cat Indices - Cat Shards - Cat Health - Cat Recovery This commit contains all the following changes (most recent first): * c6c42a1 Adapt NoOpEngineTests after #39006 * 3f9993d Wait for shards to be active after closing indices (#38854) * 5e7a428 Adapt the Cluster Health API to closed indices (#39364) * 3e61939 Adapt CloseFollowerIndexIT for replicated closed indices (#38767) * 71f5c34 Recover closed indices after a full cluster restart (#39249) * 4db7fd9 Adapt the Recovery API for closed indices (#38421) * 4fd1bb2 Adapt more tests suites to closed indices (#39186) * 0519016 Add replica to primary promotion test for closed indices (#39110) * b756f6c Test the Cluster Shard Allocation Explain API with closed indices (#38631) * c484c66 Remove index routing table of closed indices in mixed versions clusters (#38955) * 00f1828 Mute CloseFollowerIndexIT.testCloseAndReopenFollowerIndex() * e845b0a Do not schedule Refresh/Translog/GlobalCheckpoint tasks for closed indices (#38329) * cf9a015 Adapt testIndexCanChangeCustomDataPath for replicated closed indices (#38327) * b9becdd Adapt testPendingTasks() for replicated closed indices (#38326) * 02cc730 Allow shards of closed indices to be replicated as regular shards (#38024) * e53a9be Fix compilation error in IndexShardIT after merge with master * cae4155 Relax NoOpEngine constraints (#37413) * 54d110b [RCI] Adapt NoOpEngine to latest FrozenEngine changes * c63fd69 [RCI] Add NoOpEngine for closed indices (#33903) Relates to #33888

Adapt the Cluster Health API to closed indices

308cc56

tlrx added >enhancement :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. labels Feb 25, 2019

tlrx requested a review from ywelsch February 25, 2019 15:36

tlrx commented Feb 25, 2019

View reviewed changes

tlrx mentioned this pull request Feb 25, 2019

Replicate closed indices #33888

Closed

50 tasks

ywelsch requested a review from henningandersen February 25, 2019 18:35

ywelsch approved these changes Feb 26, 2019

View reviewed changes

tlrx merged commit 5e7a428 into elastic:replicated-closed-indices Feb 26, 2019

tlrx deleted the rci-adapt-cluster-health-api branch February 26, 2019 13:52

henningandersen reviewed Feb 27, 2019

View reviewed changes

tlrx mentioned this pull request Feb 28, 2019

Add support for replicating closed indices #39499

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adapt the Cluster Health and Cat Indices APIs to closed indices #39364

Adapt the Cluster Health and Cat Indices APIs to closed indices #39364

tlrx commented Feb 25, 2019

elasticmachine commented Feb 25, 2019

tlrx Feb 25, 2019

tlrx Feb 25, 2019

tlrx commented Feb 25, 2019

ywelsch Feb 26, 2019

tlrx Feb 26, 2019

ywelsch Feb 26, 2019

tlrx Feb 26, 2019

tlrx commented Feb 26, 2019

henningandersen left a comment

henningandersen Feb 26, 2019

henningandersen Feb 26, 2019

henningandersen Feb 26, 2019

henningandersen Feb 27, 2019

henningandersen Feb 27, 2019

henningandersen Feb 27, 2019

		// empty stats for closed indices, but their names are displayed
		assert indexStats == null;

Adapt the Cluster Health and Cat Indices APIs to closed indices #39364

Adapt the Cluster Health and Cat Indices APIs to closed indices #39364

Conversation

tlrx commented Feb 25, 2019

elasticmachine commented Feb 25, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tlrx commented Feb 25, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tlrx commented Feb 26, 2019

henningandersen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment