[Close Index API] Refactor MetaDataIndexStateService #36354

tlrx · 2018-12-07T09:02:26Z

Note: this pull request will be merged in the close-index-api-refactoring branch

The pull request changes how indices are closed in the MetaDataIndexStateService. It now uses a 3 steps process where writes are blocked on indices to be closed, then some verifications are done on shards using the TransportVerifyShardBeforeCloseAction added in #36249, and finally indices states are moved to CLOSE and their routing tables removed.

The closing process also takes care of

using the pre-7.0 way to close indices if the cluster contains mixed version of nodes and a node does not support the TransportVerifyShardBeforeCloseAction
closing unassigned indices (ie indices with all primary shards unassigned)

Somes tests required to be adapted, as well as the Freeze action.

This pull request will have two immediate follow ups:

handling the task ids so that we can identify the freeze parent task that trigged a close task which triggered a verify shard before close task
change the Close Index API response format from AcknowledgedResponse to a custom CloseIndexResponse that provides more information about which indices were closed and why the verification shard failed for non closed indices.

elasticmachine · 2018-12-07T09:02:28Z

Pinging @elastic/es-distributed

ywelsch

I've done an initial pass. I'm wondering a bit about the testing coverage here. I think we need some tests with concurrent close requests while indices are for example being deleted, or while shards are being unassigned, check idempotence etc. Do you consider adding an identity to the closed block as a follow-up?

ywelsch · 2018-12-07T10:11:49Z

server/src/main/java/org/elasticsearch/cluster/metadata/MetaDataIndexStateService.java

+            return;
+        } else if (indexRoutingTable.allPrimaryShardsUnassigned()) {
+            logger.debug("index {} has been blocked before closing but is now unassigned, ignoring", index);
+            listener.onResponse(new AcknowledgedResponse(true));


I wonder if we should do this check per IndexShardRoutingTable as well, otherwise we fail if some primary shards are unassigned while others are not

Good idea. I pushed dbe7b2c168c31b2f740679acd15449fdd1511694

ywelsch · 2018-12-07T10:28:51Z

server/src/test/java/org/elasticsearch/snapshots/DedicatedClusterSnapshotRestoreIT.java

-        assertAcked(client().admin().indices().prepareClose("test-idx-some", "test-idx-all").execute().actionGet());
+        assertAcked(client().admin().indices().prepareClose("test-idx-all"));
+
+        AcknowledgedResponse closeIndexResponse = client().admin().indices().prepareClose("test-idx-some").setTimeout("3s").get();


I think this one should ack as well, see my comment above

You're right, this way we keep the previous behavior. I pushed 9ac483656a9d824b42090dac0044283cc68c8fa3

ywelsch · 2018-12-07T10:31:54Z

...lugin/core/src/main/java/org/elasticsearch/xpack/core/action/TransportFreezeIndexAction.java

+                if (response.isAcknowledged()) {
+                    toggleFrozenSettings(concreteIndices, request, listener);
+                } else {
+                    listener.onResponse(new FreezeResponse(false, false));


add TODO here that we should also extend the FreezeResponse object with info about failed closing?

TODO added in a239b47a4dc0f52cd1f950eb2937fa3ee3f0d8ef

ywelsch · 2018-12-07T10:36:03Z

server/src/main/java/org/elasticsearch/cluster/metadata/MetaDataIndexStateService.java

        if (request.indices() == null || request.indices().length == 0) {
            throw new IllegalArgumentException("Index name is required");
        }
+        initiateClosing(request.indices(),  request.masterNodeTimeout(), request.ackTimeout(), listener);


Can you do the chaining at the top-level? Instead of having step1 calling step2 calling step3, I think it's better to have these as independent steps, and assemble them here at the top-level. The advantage is also that we can just mock out an intermediary step, making it easier to implement the closing in ClusterStateChanges as well as for testing these steps in isolation.

I pushed 4d2e90f1fcc977166828d2a8120267c2edd9654e

server/src/test/java/org/elasticsearch/indices/cluster/ClusterStateChanges.java

server/src/main/java/org/elasticsearch/cluster/metadata/MetaDataIndexStateService.java

tlrx · 2018-12-07T13:33:41Z

@ywelsch Thanks for the review. I addressed this first bunch of comments if you want to add more feedback.

I'm wondering a bit about the testing coverage here. I think we need some tests with concurrent close requests while indices are for example being deleted, or while shards are being unassigned, check idempotence etc.

This is something I plan to do but as follow ups PRs once the Close Index API is refactored + stabilized in this feature branch. I don't plan to merge this PR until we all agree on the API, tests etc.

Do you consider adding an identity to the closed block as a follow-up?

To be honest this is something I put aside for now. Do you think this is a blocker/mandatory for this refactoring before merging the feature branch into master?

tlrx · 2018-12-10T16:14:55Z

@ywelsch I added more tests, I'd be happy if you can have a look at them - and at other changes. Thanks

ywelsch

Thanks @tlrx. I like this better after the refactoring. I've left some more comments.

server/src/main/java/org/elasticsearch/cluster/metadata/MetaDataIndexStateService.java

server/src/test/java/org/elasticsearch/indices/cluster/ClusterStateChanges.java

server/src/test/java/org/elasticsearch/indices/state/CloseIndexIT.java

ywelsch · 2018-12-11T10:40:45Z

server/src/test/java/org/elasticsearch/indices/state/CloseWhileRelocatingShardsIT.java

+        // start some concurrent indexing threads
+        for (final String index : indices) {
+            if (randomBoolean()) {
+                final Thread thread = new Thread(() -> {


there's a utility class called BackgroundIndexer which might be of help here, see testRelocationWhileIndexingRandom.

Thanks, I didn't know about this one.

tlrx · 2018-12-12T13:38:23Z

@ywelsch I updated the code according to your comments. Can you have another look please?

I also added some unit tests.

tlrx · 2018-12-12T14:02:39Z

test/framework/src/main/java/org/elasticsearch/test/BackgroundIndexer.java

-                                        throw new ElasticsearchException("bulk request failure, id: ["
-                                                + bulkItemResponse.getFailure().getId() + "] message: "
-                                                + bulkItemResponse.getFailure().getMessage());
+                                        failures.add(bulkItemResponse.getFailure().getCause());


I had to change this so that BackgroundIndexer.totalIndexedDocs() returns the effective number of docs indexed and also reports all bulk item failures (and does not stop after the first failure).

ywelsch

LGTM

ywelsch · 2018-12-13T13:46:53Z

server/src/main/java/org/elasticsearch/cluster/metadata/MetaDataIndexStateService.java

+
+        // If the cluster is in a mixed version that does not support the shard close action,
+        // we use the previous way to close indices and directly close them without sanity checks
+        final boolean useDirectClose = currentState.nodes().getMinNodeVersion().before(Version.V_7_0_0);


are there mixed version cluster tests that check index closing?

The :qa:mixed-cluster tests runs the core REST tests and those tests contains some open/close YAML tests that fail without this logic

server/src/main/java/org/elasticsearch/cluster/metadata/MetaDataIndexStateService.java

s1monw · 2018-12-13T15:21:18Z

woohoo!

tlrx · 2018-12-13T16:34:56Z

Build is green (elasticsearch-ci-1, elasticsearch-ci-2), I'm merging into the feature branch.

tlrx · 2018-12-13T16:36:38Z

Thanks @ywelsch

This commit backports to 6.x of the Close Index API refactoring. It cherry-picks the following commits from master: 3ca885e [Close Index API] Add TransportShardCloseAction for pre-closing verifications (#36249) 8e5dd20 [Close Index API] Refactor MetaDataIndexStateService (#36354) 7372529 [Tests] Reduce randomization in CloseWhileRelocatingShardsIT (#36694) 103c4d4 [Close Index API] Mark unavailable shard copy as stale during verification (#36755) 1959388 [Close Index API] Propagate tasks ids between Freeze, Close and Verify(#36630) e149b08 [Close Index API] Add unique UUID to ClusterBlock (#36775) dc371ef [Tests] Fix ReopenWhileClosingIT with correct min num shards The following two commits were needed to adapt the change to 6.x: ef6ae69 [Close Index API] Adapt MetaDataIndexStateServiceTests after merge 21b7653 [Tests] Adapt CloseIndexIT tests for 6.x Related to #33888

tlrx added >enhancement v7.0.0 :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. labels Dec 7, 2018

tlrx requested review from s1monw, bleskes and ywelsch December 7, 2018 09:02

ywelsch suggested changes Dec 7, 2018

View reviewed changes

ywelsch suggested changes Dec 11, 2018

View reviewed changes

tlrx commented Dec 12, 2018

View reviewed changes

ywelsch approved these changes Dec 13, 2018

View reviewed changes

tlrx mentioned this pull request Dec 13, 2018

Replicate closed indices #33888

Closed

50 tasks

tlrx added 10 commits December 13, 2018 15:48

[Close Index API] Refactor MetaDataIndexStateService

62e6a1a

Mute WaitForRefreshAndCloseIT

b162fe5

Check IndexShardRoutingTable unassigned

13b9d17

Acked in DedicatedClusterSnapshotRestoreIT

83725d9

Add TODO

b62c99f

Rename ShardCloseRequest -> ShardRequest

a849071

Apply feedback

470522c

Add various close tests

a8a0b66

Apply feedback

8bc95dd

More unit tests

4ebfbea

tlrx force-pushed the refactor-metadatainexstateservice branch from 67c4af8 to 4ebfbea Compare December 13, 2018 14:58

tlrx merged commit 8e5dd20 into elastic:close-index-api-refactoring Dec 13, 2018

tlrx deleted the refactor-metadatainexstateservice branch December 13, 2018 16:36

tlrx mentioned this pull request Jan 11, 2019

Backport the Close Index API refactoring to 6.x #37359

Merged

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Close Index API] Refactor MetaDataIndexStateService #36354

[Close Index API] Refactor MetaDataIndexStateService #36354

tlrx commented Dec 7, 2018

elasticmachine commented Dec 7, 2018

ywelsch left a comment

ywelsch Dec 7, 2018

tlrx Dec 7, 2018

ywelsch Dec 7, 2018

tlrx Dec 7, 2018

ywelsch Dec 7, 2018

tlrx Dec 7, 2018

ywelsch Dec 7, 2018

tlrx Dec 7, 2018

tlrx commented Dec 7, 2018

tlrx commented Dec 10, 2018 •

edited

Loading

ywelsch left a comment

ywelsch Dec 11, 2018

tlrx Dec 11, 2018

tlrx commented Dec 12, 2018

tlrx Dec 12, 2018 •

edited

Loading

ywelsch left a comment

ywelsch Dec 13, 2018

tlrx Dec 13, 2018

s1monw commented Dec 13, 2018

tlrx commented Dec 13, 2018

tlrx commented Dec 13, 2018

[Close Index API] Refactor MetaDataIndexStateService #36354

[Close Index API] Refactor MetaDataIndexStateService #36354

Conversation

tlrx commented Dec 7, 2018

elasticmachine commented Dec 7, 2018

ywelsch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tlrx commented Dec 7, 2018

tlrx commented Dec 10, 2018 • edited Loading

ywelsch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tlrx commented Dec 12, 2018

tlrx Dec 12, 2018 • edited Loading

Choose a reason for hiding this comment

ywelsch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

s1monw commented Dec 13, 2018

tlrx commented Dec 13, 2018

tlrx commented Dec 13, 2018

tlrx commented Dec 10, 2018 •

edited

Loading

tlrx Dec 12, 2018 •

edited

Loading