You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
During QA upgrade testing from 6.8.14 to 7.11.1 we saw an error occur when we tried to have the .ml-state-write alias pointing at both .ml-state and .ml-state-000001. This is the relevant section of the Elasticsearch log:
It looks like the code in MlIndexAndAlias.createIndexAndAliasIfNecessary is working on the basis that no indices exist that match the pattern .ml-state*. The reason for this is the resolver.concreteIndexNames(clusterState, IndicesOptions.lenientExpandOpen(), indexPattern) call combined with the fact that the .ml-state index is temporarily unavailable during the upgrade. Lenient expand open deliberately ignores unavailable indices. For the purposes of determining whether an alias is already present on an index we should not ignore unavailable indices.
The text was updated successfully, but these errors were encountered:
The reason for this is the resolver.concreteIndexNames(clusterState, IndicesOptions.lenientExpandOpen(), indexPattern) call combined with the fact that the .ml-state index is temporarily unavailable during the upgrade. Lenient expand open deliberately ignores unavailable indices.
On further investigation this analysis is wrong.
The reason for the problem is that 3 threads simultaneously try to create the index, because 3 jobs need to be restarted on the node after the upgrade. This can be seen from the 3 About to create first concrete index log lines. We do account for this possibility by ignoring ResourceAlreadyExists exceptions. But after that the logic for adjusting the alias assumes that it doesn't need moving.
…69039)
When multiple jobs start up together on a node following
an upgrade, each one of them will trigger a check that the
.ml-state* indices are as expected and the .ml-state-write
alias points to the correct index.
There were a couple of flaws in the logic:
1. We were not considering the possibility that one or more
existing .ml-state* indices might be hidden.
2. If multiple jobs tried to create a .ml-state-000001 index
simultaneously all but the first would fail. We accounted
for this, but then did not follow up with the correct alias
update request for those index creation requests that
failed. This could cause all but one of the jobs starting
up on the node to spuriously fail.
Both these problems are fixed by this PR.
Fixes#68925
During QA upgrade testing from 6.8.14 to 7.11.1 we saw an error occur when we tried to have the
.ml-state-write
alias pointing at both.ml-state
and.ml-state-000001
. This is the relevant section of the Elasticsearch log:It looks like the code in
MlIndexAndAlias.createIndexAndAliasIfNecessary
is working on the basis that no indices exist that match the pattern.ml-state*
. The reason for this is theresolver.concreteIndexNames(clusterState, IndicesOptions.lenientExpandOpen(), indexPattern)
call combined with the fact that the.ml-state
index is temporarily unavailable during the upgrade. Lenient expand open deliberately ignores unavailable indices. For the purposes of determining whether an alias is already present on an index we should not ignore unavailable indices.The text was updated successfully, but these errors were encountered: