-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ML: creating ML State write alias and pointing writes there #37483
ML: creating ML State write alias and pointing writes there #37483
Conversation
Pinging @elastic/ml-core |
Where to create the alias is a harder problem than I realised. The problem with creating it in Another problem now we've decided to use the migration assistant reindexing for old 5.x indices is that It's almost as though every single write to the state index needs to be wrapped in a method that:
Because the 3 checks could be made against existing in-memory cluster state they should be fast enough that they don't cause significant overhead. This is only an idea though. Let's see if anyone else has a better idea before making any code changes. |
The migration assistant should MOVE aliases over, if it will not, As for the Is there a way to hook into the persistent task transfer to create the alias at that moment? |
The timing is difficult as if a job is left open we cannot predict when it will start up. That may happen even before the config migrator starts its work. We could move the current code in TransportOpenJobAction that creates the aliases to AutodetectProcessManager.openJob that way any job that starts will have to correct aliases. |
From what I can see here: elasticsearch/server/src/main/java/org/elasticsearch/persistent/PersistentTasksNodeService.java Lines 104 to 115 in 0227260
And: elasticsearch/server/src/main/java/org/elasticsearch/persistent/PersistentTasksCustomMetaData.java Lines 592 to 601 in f4e9729
It seems that the reassigned task is called to execute again, which would call |
Given that |
I think I commented at exactly the same time you pushed a commit sorry |
Jenkins retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving the check to AutodetectProcessManager.openJob()
will solve the problem of rolling upgrades with open jobs.
I left two other comments about places where we could write to the state index without having an open job at all, i.e. neither left open during rolling upgrade nor opened afterwards.
The migration assistant should MOVE aliases over
Yes, it will be changed to do that - see elastic/kibana#26368 (comment) - but consider this sequence of events:
- Customer first used ML in 5.x
- Customer upgrades to 6.6
- Customer closes all ML jobs
- All ML job configs are moved to indices
- Customer upgrades to 6.7
- Customer runs migration assistant and reindexes
.ml-state
- Customer now has a
.ml-state-reindexed-6
index and a.ml-state
alias - Customer opens an ML job
In step 8 we'll try to create the .ml-state-write
alias pointing at the .ml-state
index, when .ml-state
is an alias at this point.
@@ -237,7 +237,7 @@ public void persistQuantiles(Quantiles quantiles) { | |||
public void persistQuantiles(Quantiles quantiles, WriteRequest.RefreshPolicy refreshPolicy, ActionListener<IndexResponse> listener) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method can be called when reverting a model snapshot, and there's no guarantee that an autodetect process will have been started on the newer version of the product at the point when a model snapshot is reverted. The call chain is TransportRevertModelSnapshotAction.masterOperation()
-> JobManager.revertSnapshot()
-> this method. So one of those two calls needs to call AnomalyDetectorsIndex.createStateIndexAndAliasIfNecessary()
first.
@@ -347,7 +347,7 @@ public void snapshotMlMeta(MlMetadata mlMetadata, ActionListener<Boolean> listen | |||
|
|||
logger.debug("taking a snapshot of ml_metadata"); | |||
String documentId = "ml-config"; | |||
IndexRequestBuilder indexRequest = client.prepareIndex(AnomalyDetectorsIndex.jobStateIndexName(), | |||
IndexRequestBuilder indexRequest = client.prepareIndex(AnomalyDetectorsIndex.jobStateIndexWriteAlias(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no guarantee that an autodetect process will have been started on the newer version of the product at the point when this call is made - if all ML jobs are closed prior to upgrading from 6.5 to 6.7 then that will definitely trigger this situation. So this method needs to call AnomalyDetectorsIndex.createStateIndexAndAliasIfNecessary()
first.
|
||
// Only create the index or aliases if some other ML index exists - saves clutter if ML is never used. | ||
SortedMap<String, AliasOrIndex> mlLookup = state.getMetaData().getAliasAndIndexLookup().tailMap(".ml"); | ||
if (mlLookup.isEmpty() == false && mlLookup.firstKey().startsWith(".ml")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably shouldn't do the clutter avoidance in this case. If this method is called we know we're intending to write to an ML index shortly even if we never have up to this time. The effect of not creating the alias is pretty bad - it results in creation of a concrete index called .ml-state-write
, and that is hard to switch over to an alias of the same name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely, I agree. Additionally, This method should look for the concrete indices that match the prefix .ml-state*
. Should be easy enough to adjust.
); | ||
|
||
IndexNameExpressionResolver indexNameExpressionResolver = new IndexNameExpressionResolver(); | ||
String[] state_indices = indexNameExpressionResolver.concreteIndexNames(state, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: state_indices
should be stateIndices
if (state_indices.length > 0) { | ||
List<String> indices = Arrays.asList(state_indices); | ||
indices.sort(String::compareTo); | ||
createAliasListener.onResponse(indices.get(indices.size() - 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of creating the temporary list just for sorting you could sort the array directly:
Arrays.sort(stateIndices);
createAliasListener.onResponse(stateIndices[stateIndices.length - 1]);
or:
Arrays.sort(stateIndices, Collections.reverseOrder());
createAliasListener.onResponse(stateIndices[0]);
or:
createAliasListener.onResponse(Arrays.stream(stateIndices).max(String::compareTo).get());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lulz, can't believe I missed that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
run gradle build tests 1 |
run gradle build tests 2 |
run gradle build tests 1 |
run gradle build tests 2 |
run docbldesx |
run gradle build tests 2 |
run gradle build tests 1 |
1 similar comment
run gradle build tests 1 |
* ML: creating ML State write alias and pointing writes there * Moving alias check to openJob method * adjusting concrete index lookup for ml-state
* elastic/master: (104 commits) Permission for restricted indices (elastic#37577) Remove Watcher Account "unsecure" settings (elastic#36736) Add cache cleaning task for ML snapshot (elastic#37505) Update jdk used by the docker builds (elastic#37621) Remove an unused constant in PutMappingRequest. Update get users to allow unknown fields (elastic#37593) Do not add index event listener if CCR disabled (elastic#37432) Add local session timeouts to leader node (elastic#37438) Add some deprecation optimizations (elastic#37597) refactor inner geogrid classes to own class files (elastic#37596) Remove obsolete deprecation checks (elastic#37510) ML: Add support for single bucket aggs in Datafeeds (elastic#37544) ML: creating ML State write alias and pointing writes there (elastic#37483) Deprecate types in the put mapping API. (elastic#37280) [ILM] Add unfollow action (elastic#36970) Packaging: Update marker used to allow ELASTIC_PASSWORD (elastic#37243) Fix setting openldap realm ssl config Document the need for JAVA11_HOME (elastic#37589) SQL: fix object extraction from sources (elastic#37502) Nit in settings.gradle for Eclipse ...
This adds an alias
.ml-state-write
in front of.ml-state
. This way when we roll.ml-state
in the future, we can easily redirect the writes without downtime.I put the alias + index creation in the
TransportOpenJobAction
because:.ml-state-write
would end up being created dynamically as a concrete index because a Job wrote state info before the cluster state watcher could create the alias.