Backport support for replicating closed indices to 7.x #39506

tlrx · 2019-02-28T12:00:45Z

This is the backport of #39499 for 7.x.

The change on master is not merged yet but I'm creating the backport PR to let it run few cycles on CI.

elasticmachine · 2019-02-28T12:00:46Z

Pinging @elastic/es-distributed

tlrx · 2019-02-28T13:42:33Z

All tests passed:

elasticsearch-ci/1 — Details
elasticsearch-ci/2 — Details
elasticsearch-ci/bwc — Details
elasticsearch-ci/default-distro — Details
elasticsearch-ci/docbldesx — Details
elasticsearch-ci/oss-distro-docs — Details
elasticsearch-ci/packaging-sample — Details

@elasticmachine test this please

tlrx · 2019-02-28T15:28:27Z

All tests passed again:

elasticsearch-ci/1 — Details
elasticsearch-ci/2 — Details
elasticsearch-ci/bwc — Details
elasticsearch-ci/default-distro — Details
elasticsearch-ci/docbldesx — Details
elasticsearch-ci/oss-distro-docs — Details
elasticsearch-ci/packaging-sample — Details

@elasticmachine test this please

tlrx · 2019-02-28T17:11:50Z

One job failed with an unrelated issue (#28153):

elasticsearch-ci/1 — Failed because of [CI] ShrinkIndexIT.testShrinkIndexPrimaryTerm fails #28153
elasticsearch-ci/2 — Details
elasticsearch-ci/bwc — Details
elasticsearch-ci/default-distro — Details
elasticsearch-ci/docbldesx — Details
elasticsearch-ci/oss-distro-docs — Details
elasticsearch-ci/packaging-sample — Details

@elasticmachine test this please

tlrx · 2019-02-28T19:17:35Z

All tests passed again:

elasticsearch-ci/1 — Details
elasticsearch-ci/2 — Details
elasticsearch-ci/bwc — Details
elasticsearch-ci/default-distro — Details
elasticsearch-ci/docbldesx — Details
elasticsearch-ci/oss-distro-docs — Details
elasticsearch-ci/packaging-sample — Details

@elasticmachine test this please

Relates to elastic#39506

tlrx · 2019-03-01T06:58:03Z

One job failed:

elasticsearch-ci/1 — Failed
elasticsearch-ci/2 — Details
elasticsearch-ci/bwc — Details
elasticsearch-ci/default-distro — Details
elasticsearch-ci/docbldesx — Details
elasticsearch-ci/oss-distro-docs — Details
elasticsearch-ci/packaging-sample — Details

@elasticmachine test this please

tlrx · 2019-03-01T08:44:20Z

All tests passed again:

elasticsearch-ci/1 — Details
elasticsearch-ci/2 — Details
elasticsearch-ci/bwc — Details
elasticsearch-ci/default-distro — Details
elasticsearch-ci/docbldesx — Details
elasticsearch-ci/oss-distro-docs — Details
elasticsearch-ci/packaging-sample — Details

This commit adds a new NoOpEngine implementation based on the current ReadOnlyEngine. This new implementation uses an empty DirectoryReader with no segments readers and will always returns 0 docs. The NoOpEngine is the default Engine created for IndexShards of closed indices. It expects an empty translog when it is instantiated. Relates to elastic#33888

Changes were made in elastic#34357 and elastic#36467

When a NoOpEngine is instanciated, the current implementation verifies that the translog contains no operations and that it contains the same UUID as the last Lucene commit data.We can relax those two constraints because the Close Index API now ensure that all translog operations are flushed before closing a shard. The detection of coherence between translog UUID / Lucene commit data is not specific to NoOpEngine, and is already done by IndexShard.innerOpenEngineAndTranslog(). Related to elastic#33888

…astic#38024) This commit allows shards of indices in CLOSE state to be replicated as normal shards. It changes the MetaDataIndexStateService so that index routing tables of closed indices are kept in cluster state when the index is closed. Index routing tables are modified so that shard routings are reinitialized with the INDEX_CLOSED unassigned information. The IndicesClusterStateService is modified to remove IndexService instances of closed or reopened indices. In combination with the ShardRouting being in INITIALIZING state the shards are recreated on the data nodes to reflect the new state. If the index state is closed, the IndexShard instances will be created using the NoOpEngine as the engine implementation. This commit also mutes two tests that rely on the fact that shard locks are released when an index is closed, which is not the case anymore with replicated closed indices (actually the locks are released but reacquired once the shard is reinitialized after being closed). These tests will be adapted in follow up PRs. Finally, many things will require to be adapted or improved in follow up PRs (see elastic#33888) but this is the first big step towards replicated closed indices. Relates to elastic#33888

Relates to elastic#33888

…lastic#38327) Relates to elastic#33888 and elastic#38024

…dices (elastic#38329) Replicated closed indices do not need to be refreshed, neither they need their translogs or global checkpoint to be fsync. This pull request changes how `BaseAsyncTask` tasks are rescheduled in `IndexService` instances so that the tasks are rescheduled only when the index is opened. Relates to elastic#33888

…rs (elastic#38955) This pull request removes the legacy way of closing indices (aka "direct close") in mixed versions clusters, since this backward compatibility logic is not required anymore on master/8.0.0. It also changes the closing logic so that routing tables of closed indices are removed when the cluster contains a node in version < 8.0. Relates elastic#33888

…astic#38631) This pull request modifies the `ClusterAllocationExplainIT` test suite so that it always runs the tests with opened and closed indices. The only test that was not adapted for closed indices is `testAllocationFilteringOnIndexCreation` because we don't allow to directly create indices in the closed state. Relates to elastic#33888

This commit adds a simple test which verifies that a replica can be promoted as a primary when the index is closed. Relates to elastic#33888

* Adapt more tests suites to closed indices Similarly to elastic#38631, this pull request modifies multiple test suites so that they runs the tests with opened or closed indices. The suites are testing: - shard allocation filtering - shard allocation awereness - Reroute API Relates to elastic#33888

This commit adapts the Recovery API to make it work with shards of replicated closed indices. Relates elastic#33888

Closing an index is a process that can be broken down into several steps: 1. first, the state of the cluster is updated to add a write block on the index to be closed 2. then, a transport replication action is executed on all shards of the index. This action checks that the maximum sequence number and the global checkpoint have identical values, indicating that all in flight writing operations have been completed on the shard. 3. finally, and if the previous steps were successful, the cluster state is updated again to change the state of the index from `OPEN`to `CLOSE`. During the last step, the master node retrieves the minimum node version among all the nodes that compose the cluster: * If a node is in pre 8.0 version, the index is closed and the index routing table is removed from the cluster state. This is the "old" way of closing indices and closed indices with no routing table are not replicated. * If all nodes are in version 8.0 or higher, the index is closed and its routing table is reinitialized in cluster state. This is the new way of closing indices and such closed indices will be replicated in the cluster. But routing tables are not persisted in the cluster state, so after a full cluster restart there is no way to make the distinction between an index closed in 7.x and an index closed and replicated on 8.0. This commit introduces a new private index settings named `index.verified_before_close` that is added to closed indices that are replicated at closing time. This setting serves as a marker to indicate that the index has been closed using the new Close Index API on a cluster that supports replication of closed indices. This way, after a full cluster restart, the Gateway service can automatically recovers those closed indices as if they were opened indices. Closed indices that don't have this setting (because they were closed on a pre-8.0 cluster, or a cluster in mixed version) won't be recovered and will need to be reopened and closed again on a 8.0 cluster. Note that reopening the index removes the private setting. Relates to elastic#33888

Now the test `CloseFollowerIndexIT` has been added in elastic#38702, it needs to be adapted for replicated closed indices. The test closes the follower index which is lagging behind the leader index. When it's closed, no sanity checks are executed because it's a follower index (this is a consequence of elastic#38702). But with replicated closed indices, the index is reinitialized as a closed index with a `NoOpEngine` and such engines make strong assertions on the values of the maximum sequence number and the global checkpoint. Since the values do not match, the shards cannot be created and fail and the cluster health turns RED. This commit adapts the `CloseFollowerIndexIT` test so that it wraps the default `UncaughtExceptionHandler` with a handler that tolerates any exception thrown by `ReadOnlyEngine.assertMaxSeqNoEqualsToGlobalCheckpoint()`. Replacing the default uncaught exception handler requires specific permissions, and instead of creating another gradle project it duplicates the `internalClusterTest` task to make it work without security manager for this specific test only. Relates to elastic#33888

This commit adapts the Cluster Health API to support replicated closed indices. In order to do that, it removes the hard coded indices options from the `ClusterHealthRequest` and replaces it with a new `IndicesOptions.lenientExpand()` option. This option will be used by the master node (once it is upgraded to 8.0) to compute the global cluster health using both opened and closed indices information by default. The `expand_wildcards` REST parameter is also documented and tests where added to ensure that a specific expansion type can be used to monitoring the health of a only opened or only closed indices. Since the Cat Indices relies on the Cluster Health API, it has been adapted to report information about closed indices too. Note that the health and number of shards/replicas is only printed out for closed indices that have an index routing table. Closed indices without routing table have the same output as before. Related to elastic#33888

This commit changes the Close Index API to add a `wait_for_active_shards` parameter that allows to wait for shards of closed indices to be active before returning a response. Relates elastic#33888

Relates to #39506

tlrx · 2019-03-01T10:34:03Z

@ywelsch This is ready for review. This backport picks up all the same commits listed in #39499. It adds an extra commit that adapts the code for 7.1.0 (fa346c6) and only this one needs to be reviewed.

ywelsch · 2019-03-01T11:40:56Z

I've left one comment about BWC here: #39506 (comment)

tlrx · 2019-03-01T11:57:33Z

@ywelsch I've updated the code. Thanks for spotting my error :/

ywelsch

LGTM

tlrx · 2019-03-01T13:48:43Z

Thanks @ywelsch

This commit adapts the bwc layer et reenables the bwc tests after #39506 has been backported to 7.x. Related to #33888

…ed indices (#114314) Support for replicating closed indices was added in #39506 (7.1.0), we can expect the the cluster always supports replication of closed indices in 8.0/9.0

…ed indices (elastic#114314) Support for replicating closed indices was added in elastic#39506 (7.1.0), we can expect the the cluster always supports replication of closed indices in 8.0/9.0

tlrx added :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. backport labels Feb 28, 2019

tlrx added a commit to tlrx/elasticsearch that referenced this pull request Mar 1, 2019

Disable BWC tests for replicated closed indices backport

7489127

Relates to elastic#39506

tlrx mentioned this pull request Mar 1, 2019

Disable BWC tests for replicated closed indices backport #39545

Merged

tlrx added 19 commits March 1, 2019 09:46

[RCI] Adapt NoOpEngine to latest FrozenEngine changes

9eab2c7

Changes were made in elastic#34357 and elastic#36467

Fix compilation error in IndexShardIT after merge with master

a4ee55c

Adapt testPendingTasks() for replicated closed indices (elastic#38326)

37e5e88

Relates to elastic#33888

Adapt testIndexCanChangeCustomDataPath for replicated closed indices (e…

224ee2e

…lastic#38327) Relates to elastic#33888 and elastic#38024

Mute CloseFollowerIndexIT.testCloseAndReopenFollowerIndex()

c56d9bc

Add replica to primary promotion test for closed indices (elastic#39110)

4d080e9

This commit adds a simple test which verifies that a replica can be promoted as a primary when the index is closed. Relates to elastic#33888

Adapt the Recovery API for closed indices (elastic#38421)

95d9e9a

This commit adapts the Recovery API to make it work with shards of replicated closed indices. Relates elastic#33888

Wait for shards to be active after closing indices (elastic#38854)

719d905

This commit changes the Close Index API to add a `wait_for_active_shards` parameter that allows to wait for shards of closed indices to be active before returning a response. Relates elastic#33888

(7.x) BWC layer for replicated closed indices

fa346c6

tlrx force-pushed the replicated-closed-indices-7.x branch from cd3d1c1 to fa346c6 Compare March 1, 2019 08:51

tlrx added a commit that referenced this pull request Mar 1, 2019

Disable BWC tests for replicated closed indices backport (#39545)

8d42513

Relates to #39506

tlrx requested a review from ywelsch March 1, 2019 10:46

Apply feedback

f8be7f0

ywelsch approved these changes Mar 1, 2019

View reviewed changes

tlrx merged commit e005eeb into elastic:7.x Mar 1, 2019

tlrx deleted the replicated-closed-indices-7.x branch March 1, 2019 13:48

tlrx mentioned this pull request Mar 1, 2019

Adapt bwc after the backport of replicated closed indices #39566

Merged

tlrx added a commit that referenced this pull request Mar 1, 2019

Adapt bwc after the backport of replicated closed indices (#39566)

20595e6

This commit adapts the bwc layer et reenables the bwc tests after #39506 has been backported to 7.x. Related to #33888

arteam mentioned this pull request Oct 8, 2024

[test] Always assume that the old cluster support replication of closed indices #114314

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backport support for replicating closed indices to 7.x #39506

Backport support for replicating closed indices to 7.x #39506

tlrx commented Feb 28, 2019

elasticmachine commented Feb 28, 2019

tlrx commented Feb 28, 2019

tlrx commented Feb 28, 2019

tlrx commented Feb 28, 2019

tlrx commented Feb 28, 2019

tlrx commented Mar 1, 2019

tlrx commented Mar 1, 2019

tlrx commented Mar 1, 2019

ywelsch commented Mar 1, 2019

tlrx commented Mar 1, 2019

ywelsch left a comment

tlrx commented Mar 1, 2019

Backport support for replicating closed indices to 7.x #39506

Backport support for replicating closed indices to 7.x #39506

Conversation

tlrx commented Feb 28, 2019

elasticmachine commented Feb 28, 2019

tlrx commented Feb 28, 2019

tlrx commented Feb 28, 2019

tlrx commented Feb 28, 2019

tlrx commented Feb 28, 2019

tlrx commented Mar 1, 2019

tlrx commented Mar 1, 2019

tlrx commented Mar 1, 2019

ywelsch commented Mar 1, 2019

tlrx commented Mar 1, 2019

ywelsch left a comment

Choose a reason for hiding this comment

tlrx commented Mar 1, 2019