-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Automatic management for ML system indices #68044
Conversation
The ML system indices now use the special functionality for applying the correct mappings on first use. This replaces the index templates that used to do this job, but were vulnerable to tampering.
So we have to make sure all mappings are created upfront. For fields we originally missed we have to stick with the type that would have been chosen dynamically for users who acquired dynamic mappings in previous versions.
@elasticmachine update branch |
We have the concept of reserved fields for our results indices because end user fields get added to results and we need to ensure they don't clash with fields we want to use ourselves. This problem does not exist for the config index, as we don't add arbitrary end user fields. Therefore we don't need reserved fields for the config index.
This avoids the risk of future incorrect mappings caused by dynamic updates
Pinging @elastic/ml-core (:ml) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
...ck/plugin/core/src/main/resources/org/elasticsearch/xpack/core/ml/config_index_mappings.json
Show resolved
Hide resolved
"dynamic_templates": [ | ||
{ | ||
"strings_as_keywords": { | ||
"match": "*", | ||
"mapping": { | ||
"type": "keyword" | ||
} | ||
} | ||
} | ||
], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we still need dynamic_templates
here and in the config index if we are required to define every field regardless.
Is this for BWC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, these mappings will get applied to both newly created .ml-meta
indices and .ml-meta
indices that were created by older versions. If we were starting today we would have used "dynamic": false
instead of this, but we cannot change that now for existing indices.
To apply optimal mappings we will have to have a project to move everyone over to a meta index v2.
} | ||
); | ||
|
||
MlIndexAndAlias.installIndexTemplateIfRequired(clusterState, client, inferenceIndexTemplate, templateCheckListener); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may misunderstand this but:
- analytics initially started running on a cluster before we had inference model system index
- rolling upgrade starts
- analytics gets assigned to a new node (master is still on old version)
Does the system index automatically get applied when we call PUT <doc>
against the inference index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. It doesn't work for the case where the master is still on the old version.
SystemIndexDescriptor.builder() | ||
.setIndexPattern(InferenceIndexConstants.INDEX_PATTERN) | ||
.setPrimaryIndex(InferenceIndexConstants.LATEST_INDEX_NAME) | ||
.setDescription("Contains ML model configuration and statistics") | ||
.setMappings(InferenceIndexConstants.mapping()) | ||
.setSettings(InferenceIndexConstants.settings()) | ||
.setVersionMetaKey("version") | ||
.setOrigin(ML_ORIGIN) | ||
.build() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inference does ever update the existing mappings of the old index. So, on a new node where InferenceIndexConstants.LATEST_INDEX_NAME
is newer than other nodes in the index, a NEW index is created.
Do system indices still apply the latest index mapping, even if that new node is not master? Previously we got around this by manually adding the template in teh analytics job assignment, but this PR removes that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, it won't work as I have it at the moment. I think the solution will be to issue a create index request with the mappings from the upgraded node and origin set to ML_ORIGIN
where previously we created the template. That will be permitted by the master node even though the mappings don't match the old master's outdated view of what the mappings will be. I will update the code to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM2
The ML system indices now use the special functionality for applying the correct mappings on first use. This replaces the index templates that used to do this job, but were vulnerable to tampering. A number of other changes have had to be made to utilise the system index functionality: 1. All fields previously missed out of mappings have been added to the system index mappings, with the types that would have been assigned dynamically in previous versions. This is necessary because dynamic mappings updates are banned for system indices, yet some of our mappings allow dynamic updates. 2. As a result of the contradiction regarding dynamic mappings, we are now very well protected against failing to add new fields to the mappings for those indices that exhibit the contradiction (which are .ml-config and .ml-meta). This means their mappings don't need to be explicitly compared to expected mappings in upgrade tests now. Instead, any usage of a new field during or after upgrade will trigger an error in any test this occurs in. 3. Reserved fields for the config index were unnecessary (only used by tests) and just added extra complication, so they have been removed. We have the concept of reserved fields for our results indices because end user fields get added to results and we need to ensure they don't clash with fields we want to use ourselves. This problem does not exist for the config index. Backport of elastic#68044
The ML system indices now use the special functionality for applying the correct mappings on first use. This replaces the index templates that used to do this job, but were vulnerable to tampering. A number of other changes have had to be made to utilise the system index functionality: 1. All fields previously missed out of mappings have been added to the system index mappings, with the types that would have been assigned dynamically in previous versions. This is necessary because dynamic mappings updates are banned for system indices, yet some of our mappings allow dynamic updates. 2. As a result of the contradiction regarding dynamic mappings, we are now very well protected against failing to add new fields to the mappings for those indices that exhibit the contradiction (which are .ml-config and .ml-meta). This means their mappings don't need to be explicitly compared to expected mappings in upgrade tests now. Instead, any usage of a new field during or after upgrade will trigger an error in any test this occurs in. 3. Reserved fields for the config index were unnecessary (only used by tests) and just added extra complication, so they have been removed. We have the concept of reserved fields for our results indices because end user fields get added to results and we need to ensure they don't clash with fields we want to use ourselves. This problem does not exist for the config index. Backport of #68044
The ML system indices now use the special functionality for
applying the correct mappings on first use. This replaces
the index templates that used to do this job, but were
vulnerable to tampering.
A number of other changes have had to be made to utilise
the system index functionality:
added to the system index mappings, with the types that
would have been assigned dynamically in previous
versions. This is necessary because dynamic mappings
updates are banned for system indices, yet some of our
mappings allow dynamic updates.
mappings, we are now very well protected against failing
to add new fields to the mappings for those indices that
exhibit the contradiction (which are .ml-config and
.ml-meta). This means their mappings don't need to be
explicitly compared to expected mappings in upgrade
tests now. Instead, any usage of a new field during or
after upgrade will trigger an error in any test this occurs
in.
(only used by tests) and just added extra complication,
so they have been removed. We have the concept of
reserved fields for our results indices because end user
fields get added to results and we need to ensure they
don't clash with fields we want to use ourselves. This
problem does not exist for the config index.