Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Automatic management for ML system indices #68044

Merged
merged 11 commits into from
Feb 5, 2021

Conversation

droberts195
Copy link
Contributor

@droberts195 droberts195 commented Jan 27, 2021

The ML system indices now use the special functionality for
applying the correct mappings on first use. This replaces
the index templates that used to do this job, but were
vulnerable to tampering.

A number of other changes have had to be made to utilise
the system index functionality:

  1. All fields previously missed out of mappings have been
    added to the system index mappings, with the types that
    would have been assigned dynamically in previous
    versions. This is necessary because dynamic mappings
    updates are banned for system indices, yet some of our
    mappings allow dynamic updates.
  2. As a result of the contradiction regarding dynamic
    mappings, we are now very well protected against failing
    to add new fields to the mappings for those indices that
    exhibit the contradiction (which are .ml-config and
    .ml-meta). This means their mappings don't need to be
    explicitly compared to expected mappings in upgrade
    tests now. Instead, any usage of a new field during or
    after upgrade will trigger an error in any test this occurs
    in.
  3. Reserved fields for the config index were unnecessary
    (only used by tests) and just added extra complication,
    so they have been removed. We have the concept of
    reserved fields for our results indices because end user
    fields get added to results and we need to ensure they
    don't clash with fields we want to use ourselves. This
    problem does not exist for the config index.

The ML system indices now use the special functionality for
applying the correct mappings on first use. This replaces
the index templates that used to do this job, but were
vulnerable to tampering.
@droberts195
Copy link
Contributor Author

The next step for this work is to incorporate the new origin parameter added in #67114 on update mappings and update settings requests that originate within the cluster and target ML system indices. So this PR needs to wait for #67114 to be merged before further progress is possible.

So we have to make sure all mappings are created upfront.
For fields we originally missed we have to stick with the
type that would have been chosen dynamically for users who
acquired dynamic mappings in previous versions.
@mark-vieira
Copy link
Contributor

@elasticmachine update branch

We have the concept of reserved fields for our results
indices because end user fields get added to results and
we need to ensure they don't clash with fields we want
to use ourselves.

This problem does not exist for the config index, as we
don't add arbitrary end user fields.  Therefore we don't
need reserved fields for the config index.
This avoids the risk of future incorrect mappings caused by
dynamic updates
@droberts195 droberts195 marked this pull request as ready for review February 3, 2021 11:48
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml)

@benwtrent benwtrent self-requested a review February 3, 2021 15:10
Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +6 to +15
"dynamic_templates": [
{
"strings_as_keywords": {
"match": "*",
"mapping": {
"type": "keyword"
}
}
}
],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we still need dynamic_templates here and in the config index if we are required to define every field regardless.

Is this for BWC?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, these mappings will get applied to both newly created .ml-meta indices and .ml-meta indices that were created by older versions. If we were starting today we would have used "dynamic": false instead of this, but we cannot change that now for existing indices.

To apply optimal mappings we will have to have a project to move everyone over to a meta index v2.

}
);

MlIndexAndAlias.installIndexTemplateIfRequired(clusterState, client, inferenceIndexTemplate, templateCheckListener);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may misunderstand this but:

  • analytics initially started running on a cluster before we had inference model system index
  • rolling upgrade starts
  • analytics gets assigned to a new node (master is still on old version)

Does the system index automatically get applied when we call PUT <doc> against the inference index?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. It doesn't work for the case where the master is still on the old version.

Comment on lines 1183 to 1191
SystemIndexDescriptor.builder()
.setIndexPattern(InferenceIndexConstants.INDEX_PATTERN)
.setPrimaryIndex(InferenceIndexConstants.LATEST_INDEX_NAME)
.setDescription("Contains ML model configuration and statistics")
.setMappings(InferenceIndexConstants.mapping())
.setSettings(InferenceIndexConstants.settings())
.setVersionMetaKey("version")
.setOrigin(ML_ORIGIN)
.build()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inference does ever update the existing mappings of the old index. So, on a new node where InferenceIndexConstants.LATEST_INDEX_NAME is newer than other nodes in the index, a NEW index is created.

Do system indices still apply the latest index mapping, even if that new node is not master? Previously we got around this by manually adding the template in teh analytics job assignment, but this PR removes that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, it won't work as I have it at the moment. I think the solution will be to issue a create index request with the mappings from the upgraded node and origin set to ML_ORIGIN where previously we created the template. That will be permitted by the master node even though the mappings don't match the old master's outdated view of what the mappings will be. I will update the code to do that.

Copy link
Contributor

@dimitris-athanasiou dimitris-athanasiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM2

@droberts195 droberts195 merged commit 5f5968b into elastic:master Feb 5, 2021
@droberts195 droberts195 deleted the ml_system_indices branch February 5, 2021 10:15
droberts195 added a commit to droberts195/elasticsearch that referenced this pull request Feb 9, 2021
The ML system indices now use the special functionality for
applying the correct mappings on first use. This replaces
the index templates that used to do this job, but were
vulnerable to tampering.

A number of other changes have had to be made to utilise
the system index functionality:

1. All fields previously missed out of mappings have been
   added to the system index mappings, with the types that
   would have been assigned dynamically in previous
   versions.  This is necessary because dynamic mappings
   updates are banned for system indices, yet some of our
   mappings allow dynamic updates.
2. As a result of the contradiction regarding dynamic
   mappings, we are now very well protected against failing
   to add new fields to the mappings for those indices that
   exhibit the contradiction (which are .ml-config and
   .ml-meta).  This means their mappings don't need to be
   explicitly compared to expected mappings in upgrade
   tests now.  Instead, any usage of a new field during or
   after upgrade will trigger an error in any test this occurs
   in.
3. Reserved fields for the config index were unnecessary
   (only used by tests) and just added extra complication,
   so they have been removed.  We have the concept of
   reserved fields for our results indices because end user
   fields get added to results and we need to ensure they
   don't clash with fields we want to use ourselves.  This
   problem does not exist for the config index.

Backport of elastic#68044
droberts195 added a commit that referenced this pull request Feb 9, 2021
The ML system indices now use the special functionality for
applying the correct mappings on first use. This replaces
the index templates that used to do this job, but were
vulnerable to tampering.

A number of other changes have had to be made to utilise
the system index functionality:

1. All fields previously missed out of mappings have been
   added to the system index mappings, with the types that
   would have been assigned dynamically in previous
   versions.  This is necessary because dynamic mappings
   updates are banned for system indices, yet some of our
   mappings allow dynamic updates.
2. As a result of the contradiction regarding dynamic
   mappings, we are now very well protected against failing
   to add new fields to the mappings for those indices that
   exhibit the contradiction (which are .ml-config and
   .ml-meta).  This means their mappings don't need to be
   explicitly compared to expected mappings in upgrade
   tests now.  Instead, any usage of a new field during or
   after upgrade will trigger an error in any test this occurs
   in.
3. Reserved fields for the config index were unnecessary
   (only used by tests) and just added extra complication,
   so they have been removed.  We have the concept of
   reserved fields for our results indices because end user
   fields get added to results and we need to ensure they
   don't clash with fields we want to use ourselves.  This
   problem does not exist for the config index.

Backport of #68044
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants