Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SO migration v2: fail the migration early if unknown types are encountered #101351

Closed
pgayvallet opened this issue Jun 4, 2021 · 14 comments · Fixed by #103341
Closed

SO migration v2: fail the migration early if unknown types are encountered #101351

pgayvallet opened this issue Jun 4, 2021 · 14 comments · Fixed by #103341
Assignees
Labels
bug Fixes for quality problems that affect the customer experience project:ResilientSavedObjectMigrations Reduce Kibana upgrade failures by making saved object migrations more resilient Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc

Comments

@pgayvallet
Copy link
Contributor

See #101052 for the initial discussion

Currently, when migrating a document, if the associated SO type is not registered, the document is handled the same way as if it was from a higher Kibana version, and is therefor considered unable to be migrated, causing the whole migration to fail.

If this is not something new (this issue was still present in migration v1, as the 'culprit' is the DocumentMigrator used in both migrations algorithms), it is still quite problematic.

The main problems are

  • If a customer decides to disable a plugin registering SO type(s) it was previously using (aka some documents from the registered type were created), and then try to upgrade Kibana, the migration will fail because the types will no longer be registered
  • In we look ahead, this is especially concerning when we'll focus on 3rd party plugins, as the scenario where a customer tries a plugin for some times and then disable it seems way more likely to occur than disabling the internal Kibana plugins.

What we want to do instead of failing is to copy the document into the temp/new index without migrating it (well, without migrating the prop of the unknown type, as we now have different kind of migrations)

Technically, that needs to be done is to change the behavior of nextUnmigratedProp in the DocumentMigration to skip migration of a type instead of failing when the type is not registered

if (docVersion && (!latestMigrationVersion || Semver.gt(docVersion, latestMigrationVersion))) {
throw Boom.badData(
`Document "${doc.id}" has property "${p}" which belongs to a more recent` +
` version of Kibana [${docVersion}]. The last known version is [${latestMigrationVersion}]`,
doc
);
}

If this may be as easy as removing the !latestMigrationVersion part, I suspect there may be subtle things to handle on the higher layers.

Probably non exhaustive, but

  • we need to check that the unknown type's migrationVersion is preserved as it was previously
  • we need to ensure that the outDatedDocumentsTransform stages are still working as supposed (e.g check if outdatedDocumentsQuery is still alright)
  • we need to ensure that the reference transforms are still applied correctly (both inward and outward), either when the type is re-enabled, or during the unknown doc migration

Note that in case of multi-instance Kibana with different plugin sets, this could still cause issues, as if an instance without a plugin installed performs the migration last, the document in the new index will not be migrated (see #101052 (comment) for the detailed scenario), so we also need to consider (and probably document) that having multiple instances with different plugin sets is not something that we're supporting.

@pgayvallet pgayvallet added bug Fixes for quality problems that affect the customer experience Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc project:ResilientSavedObjectMigrations Reduce Kibana upgrade failures by making saved object migrations more resilient labels Jun 4, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

@pgayvallet
Copy link
Contributor Author

CC @jportner

we need to ensure that the reference transforms are still applied correctly (both inward and outward), either when the type is re-enabled, or during the unknown doc migration

I'm unsure how the reference transform you added will behave (with current implementation) in the following scenario, if we decided to not fail when encounter unknown SO types

  1. during migration to x.x.x
    • foo is going to be converted to multinamespace (convertToMultiNamespaceTypeVersion=x.x.x)
    • unknown type, that is not present/registered during the migration, has references to foo

I think we would still be applying the reference transform to unknown documents, but could you confirm that?

@lukeelmers
Copy link
Member

Just a note that once we decide on a solution here and implement it, we need to go back and update the integration test that started this whole discussion: #100171

(Or alternately incorporate that test into the PR addressing this issue)

@jportner
Copy link
Contributor

jportner commented Jun 6, 2021

I'm unsure how the reference transform you added will behave (with current implementation) in the following scenario, if we decided to not fail when encounter unknown SO types

  1. during migration to x.x.x

    • foo is going to be converted to multinamespace (convertToMultiNamespaceTypeVersion=x.x.x)
    • unknown type, that is not present/registered during the migration, has references to foo

I think we would still be applying the reference transform to unknown documents, but could you confirm that?

No. Currently, the DocumentMigrator builds an object map containing migration info and transform functions for each known type (emphasis on line 370):

function buildActiveMigrations(
typeRegistry: ISavedObjectTypeRegistry,
kibanaVersion: string,
log: Logger
): ActiveMigrations {
const referenceTransforms = getReferenceTransforms(typeRegistry);
return typeRegistry.getAllTypes().reduce((migrations, type) => {
const migrationsMap =
typeof type.migrations === 'function' ? type.migrations() : type.migrations;
validateMigrationsMapObject(type.name, kibanaVersion, migrationsMap);
const migrationTransforms = Object.entries(migrationsMap ?? {}).map<Transform>(
([version, transform]) => ({
version,
transform: wrapWithTry(version, type, transform, log),
transformType: 'migrate',
})
);
const conversionTransforms = getConversionTransforms(type);
const transforms = [
...referenceTransforms,
...conversionTransforms,
...migrationTransforms,
].sort(transformComparator);
if (!transforms.length) {
return migrations;
}
const migrationVersionTransforms: Transform[] = [];
const coreMigrationVersionTransforms: Transform[] = [];
transforms.forEach((x) => {
if (x.transformType === 'migrate' || x.transformType === 'convert') {
migrationVersionTransforms.push(x);
} else {
coreMigrationVersionTransforms.push(x);
}
});
return {
...migrations,
[type.name]: {
latestMigrationVersion: _.last(migrationVersionTransforms)?.version,
latestCoreMigrationVersion: _.last(coreMigrationVersionTransforms)?.version,
transforms,
},
};
}, {} as ActiveMigrations);
}

You could modify that function to pass back both the object map and the referenceTransforms, and just apply the reference transforms in an event when an unknown type is encountered.

@pgayvallet
Copy link
Contributor Author

You could modify that function to pass back both the object map and the referenceTransforms, and just apply the reference transforms in an event when an unknown type is encountered.

Yea.. this may be tricky because we'll still need to assume that the unknown doc is following the SO document format, but I don't think we really have any other option here.

@joshdover
Copy link
Contributor

Yea.. this may be tricky because we'll still need to assume that the unknown doc is following the SO document format, but I don't think we really have any other option here.

This seems like a pretty low-risk assumption to make. The fact that we use strict mappings helps in this regard as well, though the document could still be missing many fields. But honestly, if a document is that 'corrupt' then someone probably should be taking a look at it manually rather than Kibana automatically deciding what to do with it.

@pgayvallet
Copy link
Contributor Author

pgayvallet commented Jun 8, 2021

You could modify that function to pass back both the object map and the referenceTransforms, and just apply the reference transforms in an event when an unknown type is encountered.

After a closer look, this seems unfortunately a little trickier than that.

We're currently using the coreMigrationVersion for two two things

  • The conversion transforms
    • atm the only existing one is to convert to multi namespaces
  • The references transforms
    • this is used to rewrite the ids of the references to objects having been converted to multi-namespaces

Also, we're currently always updating this coreMigrationVersion value to the latest version during each migration:

if (!prop) {
// regardless of whether or not any reference transform was applied, update the coreMigrationVersion
// this is needed to ensure that newly created documents have an up-to-date coreMigrationVersion field
return {
transformedDoc: { ...doc, coreMigrationVersion: kibanaVersion },
additionalDocs,
};
}

If the main use cases are working fine, there are edge cases involving migrating a disabled type across the version it is supposed to be converted to multi-namespace that are causing issues.

For instance:

  • we're in version 7.14.0
  • we got two types:
    • foo that has its convertToMultiNamespaceTypeVersion set to 7.15.0
    • bar that has outward references to foo
  • type foo got unregistered by having its owning plugin disabled
  • we migrate to 7.15.0

There are two issues here if we do copy the unknown foo documents without failing the migration:

  1. The foo documents will have their coreMigrationVersion set to 7.15.0, meaning that if/when we'll be re-enabling the type (by re-enabling its plugin), we will NOT be properly applying the multi-ns conversion (or any other conversion transform, for that matter)

For this specific problem, I think the correct solution is just to NOT migrate the document at all if an unknown document is encountered. That way, we're sure that all the references and/conversion transforms will be applied when the type is re-enabled and handled by the OUTDATED_DOCUMENTS_SEARCH_READ stage.

Note that to achieve this, we'll probably need to break the current (unused) assumption that a document can have multiple type, and remove this while loop here:

while (true) {
const prop = nextUnmigratedProp(doc, migrations);
if (!prop) {
// regardless of whether or not any reference transform was applied, update the coreMigrationVersion
// this is needed to ensure that newly created documents have an up-to-date coreMigrationVersion field
return {
transformedDoc: { ...doc, coreMigrationVersion: kibanaVersion },
additionalDocs,
};
}
const result = migrateProp(doc, prop, migrations, convertNamespaceTypes);
doc = result.transformedDoc;
additionalDocs = [...additionalDocs, ...result.additionalDocs];
}

To be able to identify each document as a single and specific type, to be able to see if the type is, or not, known, as the behavior would differ in this case.

  1. The references from bar to foo will not be properly updated during the migration where bar was disabled. However, as the bar documents' coreMigrationVersion will be already set to 7.15.0, when re-enabling the foo type, the references will still not be updated, resulting in broken references

This one is way harder to solve. To be honest, I don't really have any good idea to address that, apart from splitting coreMigrationVersion into two things

  • convertMigrationVersion, which would be the version for the convert migrations
  • referencesMigration, which would be a record type->boolean.

Having this per-type boolean would allow to specifically flag which references were converted or not. That way, in the edge previously describe edge case, bar.referencesMigration.foo would still be undefined, meaning that we could theoretically be able to migrate them when the type got re-enabled.

However... as the bar type would properly be migrated already, we would need to update the query used during the OUTDATED_DOCUMENTS_SEARCH_READ to also search for documents having referencesMigration set to false or undefined for types that are supposed to be converted to multi-namespace already (and also change the document migration to handle this specific case where a document is 'up to date'-ish, except from some of its references, and only need to migrate them.

These are edge cases, but the risk of data corruption if such scenario is encountered is very real, so I think we can't really ignore them, and will have to find a proper solution.

cc @jportner @joshdover

@pgayvallet
Copy link
Contributor Author

pgayvallet commented Jun 8, 2021

Note that a workaround for problem 2 (the really significant one) would be to check the references integrity of registered objects' references, and throw if a SO got references to an unknown type (given that we solve problem 1. by copying the unknown docs without migrating them at all)

If not ideal, this would still allow to

  • have 'isolated' unknown SO types be copied during the migration
  • have 'isolated' groups of SO types be copied during the migration (e.g a plugin registered both foo and bar which references between these two types and is disabled)

@joshdover
Copy link
Contributor

This is starting feel like a very expensive and potentially dangerous use case to try to start supporting this late in the game.

If it had been as simple as we were originally expecting hoping, fixing this would make sense. But in light of our plans to significantly reduce the plugins that are disable-able (#89584), I'm not sure the effort here makes sense.

I lean towards going back and revisiting #101052 instead, but clarifying the criteria a bit. The main use case we have for disabling plugins is to debug when there is a performance problem. It seems to me that if a user disables a plugin and then later upgrades to a newer Kibana version without enabling that plugin again, it's safe to say that they don't need that data anymore and we can ask them to delete it.

In other words, I don't think we need to support this:

  • Kibana v7.12 creates data for foo SO type
  • Admin disables foo plugin to debug a problem
  • Admin upgrades Kibana to v7.13 with foo plugin still disabled
  • Admin enables foo plugin again and expects the foo documents to be upgraded

The only case I think we really need to support is:

  • Kibana v7.12 creates data for foo SO type
  • Admin disabled foo plugin to debug a problem (Kibana starts up without complaining about existing docs)
  • Admin enables foo plugin again
  • Admin upgrades to v7.13

OR

  • Kibana v7.12 creates data for foo SO type
  • Admin disabled foo plugin to debug a problem (Kibana starts up without complaining about existing docs)
  • Admin deletes all foo documents (optionally, exporting before deleting)
  • Admin upgrades to v7.13 (with foo still disabled)

So we should be able to refuse to upgrade if any documents of unknown types are found AND be able to remove the OUTDATED_DOCUMENTS_* steps without sacrificing the ability to disable plugins. We're just sacrificing the ability to disable plugins AND keep their data AND upgrade Kibana. That may still be a valid use case out there, but it seems solvable though other means, like Elasticsearch snapshots or import/export.

@kobelb Do you have any additional thoughts here?

@kobelb
Copy link
Contributor

kobelb commented Jun 9, 2021

@joshdover do we have users hitting this bug? Could we postpone doing anything for the remainder of 7.x and starting in 8.0 explicitly block upgrades if there are any unknown saved-object types?

Regardless of the answer to the prior question, how do our users delete the documents that were created by an uninstalled or disabled plugin?

@joshdover
Copy link
Contributor

do we have users hitting this bug? Could we postpone doing anything for the remainder of 7.x and starting in 8.0 explicitly block upgrades if there are any unknown saved-object types?

Probably? Though this behavior has been here quite some time and I haven't seen it be a significant source of support issues. However we don't have great visibility into it, so I'm not sure how frequent it's happening.

Regardless of the answer to the prior question, how do our users delete the documents that were created by an uninstalled or disabled plugin?

They would need to manually delete these documents from the index. Pre-system indices this is feasible for customers to do, which works out perfectly if we're ok with blocking upgrades when there are unknown types in 8.0 (where system indices will become strictly enforced).

I think we could scope this issue to:

  • Add a step to migrations that scans for unknown types and throws an error before making any modifications (including setting the write block).
  • Include instructions on how to remove these documents with curl

@kobelb
Copy link
Contributor

kobelb commented Jun 10, 2021

Add a step to migrations that scans for unknown types and throws an error before making any modifications (including setting the write block).
Include instructions on how to remove these documents with curl

This seems reasonable to me.

@joshdover
Copy link
Contributor

@pgayvallet are you 👍 on the above plan?

@pgayvallet
Copy link
Contributor Author

Add a step to migrations that scans for unknown types and throws an error before making any modifications (including setting the write block).

Yea, that sounds good to me. Fail fast and avoid altering the source index's meta.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience project:ResilientSavedObjectMigrations Reduce Kibana upgrade failures by making saved object migrations more resilient Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants