Skip to content

Commit

Permalink
[ML] Fix logic for moving .ml-state-write alias from legacy to new (#…
Browse files Browse the repository at this point in the history
…69039)

When multiple jobs start up together on a node following
an upgrade, each one of them will trigger a check that the
.ml-state* indices are as expected and the .ml-state-write
alias points to the correct index.

There were a couple of flaws in the logic:

1. We were not considering the possibility that one or more
   existing .ml-state* indices might be hidden.
2. If multiple jobs tried to create a .ml-state-000001 index
   simultaneously all but the first would fail.  We accounted
   for this, but then did not follow up with the correct alias
   update request for those index creation requests that
   failed.  This could cause all but one of the jobs starting
   up on the node to spuriously fail.

Both these problems are fixed by this PR.

Fixes #68925
  • Loading branch information
droberts195 authored Feb 19, 2021
1 parent 7fb98c0 commit bc46fc0
Show file tree
Hide file tree
Showing 2 changed files with 13 additions and 7 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ public static void createIndexAndAliasIfNecessary(Client client,
// The initial index name must be suitable for rollover functionality.
String firstConcreteIndex = indexPatternPrefix + "-000001";
String[] concreteIndexNames =
resolver.concreteIndexNames(clusterState, IndicesOptions.lenientExpandOpen(), indexPattern);
resolver.concreteIndexNames(clusterState, IndicesOptions.lenientExpandHidden(), indexPattern);
Optional<IndexMetadata> indexPointedByCurrentWriteAlias = clusterState.getMetadata().hasAlias(alias)
? clusterState.getMetadata().getIndicesLookup().get(alias).getIndices().stream().findFirst()
: Optional.empty();
Expand Down Expand Up @@ -241,11 +241,17 @@ private static void createFirstConcreteIndex(Client client,
ActionListener.<CreateIndexResponse>wrap(
createIndexResponse -> listener.onResponse(true),
createIndexFailure -> {
// If it was created between our last check, and this request being handled, we should add the alias
// Adding an alias that already exists is idempotent. So, no need to double check if the alias exists
// as well.
if (ExceptionsHelper.unwrapCause(createIndexFailure) instanceof ResourceAlreadyExistsException) {
updateWriteAlias(client, alias, null, index, listener);
// If it was created between our last check and this request being handled, we should add the alias
// if we were asked to add it on creation. Adding an alias that already exists is idempotent. So
// no need to double check if the alias exists as well. But if we weren't asked to add the alias
// on creation then we should leave it up to the caller to decide what to do next (some call sites
// already have more advanced alias update logic in their success handlers).
if (addAlias) {
updateWriteAlias(client, alias, null, index, listener);
} else {
listener.onResponse(true);
}
} else {
listener.onFailure(createIndexFailure);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@
*/
public class JobStorageDeletionTaskIT extends BaseMlIntegTestCase {

private static long bucketSpan = AnalysisConfig.Builder.DEFAULT_BUCKET_SPAN.getMillis();
private static final long bucketSpan = AnalysisConfig.Builder.DEFAULT_BUCKET_SPAN.getMillis();
private static final String UNRELATED_INDEX = "unrelated-data";

private JobResultsProvider jobResultsProvider;
Expand Down Expand Up @@ -179,7 +179,7 @@ public void testDeleteDedicatedJobWithDataInShared() throws Exception {
.indices()
.prepareGetIndex()
.setIndices(dedicatedIndex)
.setIndicesOptions(IndicesOptions.LENIENT_EXPAND_OPEN)
.setIndicesOptions(IndicesOptions.LENIENT_EXPAND_OPEN_CLOSED_HIDDEN)
.get()
.indices().length, equalTo(0));

Expand Down

0 comments on commit bc46fc0

Please sign in to comment.