-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Close Index API] Refactor MetaDataIndexStateService #36354
[Close Index API] Refactor MetaDataIndexStateService #36354
Conversation
Pinging @elastic/es-distributed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done an initial pass. I'm wondering a bit about the testing coverage here. I think we need some tests with concurrent close requests while indices are for example being deleted, or while shards are being unassigned, check idempotence etc. Do you consider adding an identity to the closed block as a follow-up?
return; | ||
} else if (indexRoutingTable.allPrimaryShardsUnassigned()) { | ||
logger.debug("index {} has been blocked before closing but is now unassigned, ignoring", index); | ||
listener.onResponse(new AcknowledgedResponse(true)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should do this check per IndexShardRoutingTable as well, otherwise we fail if some primary shards are unassigned while others are not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. I pushed dbe7b2c168c31b2f740679acd15449fdd1511694
assertAcked(client().admin().indices().prepareClose("test-idx-some", "test-idx-all").execute().actionGet()); | ||
assertAcked(client().admin().indices().prepareClose("test-idx-all")); | ||
|
||
AcknowledgedResponse closeIndexResponse = client().admin().indices().prepareClose("test-idx-some").setTimeout("3s").get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this one should ack as well, see my comment above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, this way we keep the previous behavior. I pushed 9ac483656a9d824b42090dac0044283cc68c8fa3
if (response.isAcknowledged()) { | ||
toggleFrozenSettings(concreteIndices, request, listener); | ||
} else { | ||
listener.onResponse(new FreezeResponse(false, false)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add TODO here that we should also extend the FreezeResponse object with info about failed closing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO added in a239b47a4dc0f52cd1f950eb2937fa3ee3f0d8ef
if (request.indices() == null || request.indices().length == 0) { | ||
throw new IllegalArgumentException("Index name is required"); | ||
} | ||
initiateClosing(request.indices(), request.masterNodeTimeout(), request.ackTimeout(), listener); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you do the chaining at the top-level? Instead of having step1 calling step2 calling step3, I think it's better to have these as independent steps, and assemble them here at the top-level. The advantage is also that we can just mock out an intermediary step, making it easier to implement the closing in ClusterStateChanges as well as for testing these steps in isolation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed 4d2e90f1fcc977166828d2a8120267c2edd9654e
server/src/test/java/org/elasticsearch/indices/cluster/ClusterStateChanges.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/metadata/MetaDataIndexStateService.java
Outdated
Show resolved
Hide resolved
@ywelsch Thanks for the review. I addressed this first bunch of comments if you want to add more feedback.
This is something I plan to do but as follow ups PRs once the Close Index API is refactored + stabilized in this feature branch. I don't plan to merge this PR until we all agree on the API, tests etc.
To be honest this is something I put aside for now. Do you think this is a blocker/mandatory for this refactoring before merging the feature branch into master? |
@ywelsch I added more tests, I'd be happy if you can have a look at them - and at other changes. Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tlrx. I like this better after the refactoring. I've left some more comments.
server/src/main/java/org/elasticsearch/cluster/metadata/MetaDataIndexStateService.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/metadata/MetaDataIndexStateService.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/metadata/MetaDataIndexStateService.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/metadata/MetaDataIndexStateService.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/metadata/MetaDataIndexStateService.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/metadata/MetaDataIndexStateService.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/indices/cluster/ClusterStateChanges.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/indices/state/CloseIndexIT.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/indices/state/CloseIndexIT.java
Outdated
Show resolved
Hide resolved
// start some concurrent indexing threads | ||
for (final String index : indices) { | ||
if (randomBoolean()) { | ||
final Thread thread = new Thread(() -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's a utility class called BackgroundIndexer which might be of help here, see testRelocationWhileIndexingRandom
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I didn't know about this one.
@ywelsch I updated the code according to your comments. Can you have another look please? I also added some unit tests. |
throw new ElasticsearchException("bulk request failure, id: [" | ||
+ bulkItemResponse.getFailure().getId() + "] message: " | ||
+ bulkItemResponse.getFailure().getMessage()); | ||
failures.add(bulkItemResponse.getFailure().getCause()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to change this so that BackgroundIndexer.totalIndexedDocs() returns the effective number of docs indexed and also reports all bulk item failures (and does not stop after the first failure).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
||
// If the cluster is in a mixed version that does not support the shard close action, | ||
// we use the previous way to close indices and directly close them without sanity checks | ||
final boolean useDirectClose = currentState.nodes().getMinNodeVersion().before(Version.V_7_0_0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there mixed version cluster tests that check index closing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The :qa:mixed-cluster tests runs the core REST tests and those tests contains some open/close YAML tests that fail without this logic
server/src/main/java/org/elasticsearch/cluster/metadata/MetaDataIndexStateService.java
Show resolved
Hide resolved
67c4af8
to
4ebfbea
Compare
woohoo! |
Build is green (elasticsearch-ci-1, elasticsearch-ci-2), I'm merging into the feature branch. |
Thanks @ywelsch |
This commit backports to 6.x of the Close Index API refactoring. It cherry-picks the following commits from master: 3ca885e [Close Index API] Add TransportShardCloseAction for pre-closing verifications (#36249) 8e5dd20 [Close Index API] Refactor MetaDataIndexStateService (#36354) 7372529 [Tests] Reduce randomization in CloseWhileRelocatingShardsIT (#36694) 103c4d4 [Close Index API] Mark unavailable shard copy as stale during verification (#36755) 1959388 [Close Index API] Propagate tasks ids between Freeze, Close and Verify(#36630) e149b08 [Close Index API] Add unique UUID to ClusterBlock (#36775) dc371ef [Tests] Fix ReopenWhileClosingIT with correct min num shards The following two commits were needed to adapt the change to 6.x: ef6ae69 [Close Index API] Adapt MetaDataIndexStateServiceTests after merge 21b7653 [Tests] Adapt CloseIndexIT tests for 6.x Related to #33888
Note: this pull request will be merged in the
close-index-api-refactoring
branchThe pull request changes how indices are closed in the
MetaDataIndexStateService
. It now uses a 3 steps process where writes are blocked on indices to be closed, then some verifications are done on shards using theTransportVerifyShardBeforeCloseAction
added in #36249, and finally indices states are moved to CLOSE and their routing tables removed.The closing process also takes care of
TransportVerifyShardBeforeCloseAction
Somes tests required to be adapted, as well as the Freeze action.
This pull request will have two immediate follow ups: