Refuse to start with unknown saved object types in 8.0 #107678

joshdover · 2021-08-04T18:09:10Z

Update: to protect users data we have decided to prevent upgrades from succeeding if unknown saved object types are encountered.

Context (newest-oldest) on this discussion:

In order to improve the reliability of the Saved Object migration system, we'd need to eliminate situations that can cause data integrity issues. One of those situations is with how we currently handle objects of unknown types. The current behavior will retain these documents in the Kibana index, unmodified from their previous state. Whenever the type becomes known again (by re-enabling a plugin), Kibana will migrate this document to the most recent schema.

Supporting this is becoming ever more challenging as we expand our ability to do different types of migrations. For example, this behavior will result in corrupted state if there are any other objects that reference objects of a disabled type during the multi-namespace migration and then that type is enabled again later.

Data integrity is important for the long-term maintenance of a Kibana installation and is especially important to long-term, wide-scale users of Kibana. If we want to eliminate this type of data corruption, then we need to take some action here to prevent such scenarios.

The scenario where the migration system encounters objects of unknown types can occur in the following situations:

First-party plugins

When a 1st party Elastic plugin is disabled after storing some data
- We plan to stop allowing most plugins to be disabled starting in 8.0 By default, a plugin should not be disable-able #89584 which should dramatically mitigate, if not eliminate, this scenario going forward.
When a Saved Object type is removed from usage from a 1st party plugin
- This scenario has been eliminated by Add test that ensures that SO types are not removed #104418 which requires that all 1st party types that are removed are added to filter to be excluded from the next upgrade.

Third-party plugins

When a 3rd party plugin is disabled or uninstalled after storing some data
- This may be especially common for 3rd party plugins due to how we require plugins to be built specifically for each Kibana version. It's possible that a user may decide to upgrade their cluster without some custom plugins installed and then install them again later when they've been updated. In this case, I think users expect their data to be intact.
When a Saved Object type is removed from usage from a 3rd party plugin
- There's really not a way with the current architecture that we could easily detect this scenario distinctly from the scenario above, so the options are essentially the same.

Solutions

For users that only use 1st party plugins the solution seems quite straightforward. Given that we've already made progress on mitigating the ways this issue could happen, we either:

Refuse to start Kibana if any objects of unknown types exist in the index
Automatically filter for only for documents of known types and either leave the unknown documents in the previous index or move them to a special .kibana-orphaned index (for example).

(2) is preferable from a user perspective since it unblocks users from upgrading Kibana quickly, but has the drawback of 'silently' excluding documents.

The challenge is how can we handle the scenario with 3rd party plugins where they may not be installed during an upgrade. I propose we go with solution (2) and add a mechanism for importing & migrating documents from the .kibana-orphaned index that are detected as now being known by any plugin. This allows us to keep the default, happy path as safe as possible while giving more advanced users a way to recover their data in ways that may or may not be 100% integral. This mechanism could be exposed via either a config option or a UI prompt to import these objects after Kibana has started (which has some drawbacks but may work for most 3rd party plugins).

For customers that absolutely need 100% data integrity, we can recommend they ensure that all 3rd party plugins are installed during their upgrade rather than afterwards. For all others, we have an escape hatch that will probably work most of the time, but is not guaranteed and therefore not enabled by default.

elasticmachine · 2021-08-04T18:09:12Z

Pinging @elastic/kibana-core (Team:Core)

joshdover · 2021-08-05T11:55:50Z

A decision on this blocks progress on #105272 and #107740

lukeelmers · 2021-08-16T16:22:37Z

Discussed this with @rudolf and @pgayvallet -- The main concern with solution (2) is around how to handle SO references:

If there are any inbound references to the SO of an unknown type, they will break when we move the SO to an orphan index
- Do we orphan the referring SOs as well to prevent breaking references? Depending on the number of references, this could mean removing a lot of extra objects which may be a surprise to the user (especially if the only way to fix it is to manually re-import them later)
As part of the sharing saved objects effort, we'll be regenerating IDs on SOs. This means that if we "quarantine" the unknown SO, outbound references from it will break once the IDs of other objects are changed.
- As long as SO aliases are around and used to resolve references, then maybe this could actually still work?
- If the IDs were stable, this would no longer be a concern. So if we wait until a time in the future when we no longer needed to worry about regenerated IDs (late 8.x?), this will no longer be a problem

rudolf · 2021-08-17T08:06:33Z

I think it's actually the inbound references to the quarantined object that would break. When a type is unknown we don't know if we should regenerate the id's (it might be a single namespace type which doesn't require regeneration) so inbound references are left intact. When this type later becomes known we might regenerate it's id, but we wouldn't regenerate the inbound references.

As long as SO aliases are around and used to resolve references, then maybe this could actually still work?

For performance we don't want to resolve id's on every operation e.g. an update or get, so resolve only happens in the plugin code where that plugin expects to be handling user input (e.g. an url with an id).

I think we have a few options that we could explore to solve this problem, but all of them means changes to a complex system that's quite hard to reason about and change. So perhaps we first need to establish whether this is a high enough priority from a product perspective?

rudolf · 2021-11-05T12:53:11Z

We have reached the following decision to be implemented in 8.0

Doing nothing in 8.0 is problematically lenient: if we don't implement a way to handle these cases, we risk causing data loss or corruption for users. Rather than allowing this, we'd prefer to fail fast with a clear message outlining the problem.

pgayvallet · 2021-11-09T15:39:44Z

FWIW, this is what we were doing before #105213, so implementing this should in theory just be a revert of the linked PR.

joshdover added Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Feature:Saved Objects project:ResilientSavedObjectMigrations Reduce Kibana upgrade failures by making saved object migrations more resilient labels Aug 4, 2021

joshdover mentioned this issue Aug 5, 2021

Add Upgrade Assistant deprecation warning for unknown Saved Object types #105272

Closed

joshdover mentioned this issue Aug 5, 2021

Use a scripted reindex for re-writing document _ids for shared saved object types #107740

Closed

6 tasks

lukeelmers mentioned this issue Aug 12, 2021

[meta] saved objects improvements #101564

Open

17 tasks

joshdover mentioned this issue Sep 6, 2021

Add deprecation warning when unknown SO types are present #111268

Merged

3 tasks

rudolf added EnableJiraSync and removed EnableJiraSync labels Nov 5, 2021

rudolf changed the title ~~Handling unknown saved object types in 8.0~~ Refuse to start with unknown saved object types in 8.0 Nov 5, 2021

rudolf mentioned this issue Nov 5, 2021

Fail saved object migrations when encountering embeddable state from an unknown embeddable factory #117656

Closed

pgayvallet self-assigned this Nov 10, 2021

pgayvallet mentioned this issue Nov 11, 2021

[SO migration] fail the migration if unknown types are encountered #118300

Merged

1 task

pgayvallet closed this as completed in #118300 Nov 16, 2021

rudolf mentioned this issue Nov 30, 2021

[docs] Document how to solve migrations failing with unknown saved object types #119944

Closed

tsullivan mentioned this issue Dec 9, 2021

[Reporting] Don't allow Reporting be completely disabled in configuration. #119914

Closed

lukeelmers mentioned this issue Jan 5, 2022

[DOCS] Adds the 8.0.0-rc1 release notes #120806

Merged

rudolf mentioned this issue Jun 21, 2022

don't merge mappings from source index #134809

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refuse to start with unknown saved object types in 8.0 #107678

Refuse to start with unknown saved object types in 8.0 #107678

joshdover commented Aug 4, 2021 •

edited by petrklapka

Loading

elasticmachine commented Aug 4, 2021

joshdover commented Aug 5, 2021 •

edited by lukeelmers

Loading

lukeelmers commented Aug 16, 2021

rudolf commented Aug 17, 2021

rudolf commented Nov 5, 2021

pgayvallet commented Nov 9, 2021

Refuse to start with unknown saved object types in 8.0 #107678

Refuse to start with unknown saved object types in 8.0 #107678

Comments

joshdover commented Aug 4, 2021 • edited by petrklapka Loading

First-party plugins

Third-party plugins

Solutions

elasticmachine commented Aug 4, 2021

joshdover commented Aug 5, 2021 • edited by lukeelmers Loading

lukeelmers commented Aug 16, 2021

rudolf commented Aug 17, 2021

rudolf commented Nov 5, 2021

pgayvallet commented Nov 9, 2021

joshdover commented Aug 4, 2021 •

edited by petrklapka

Loading

joshdover commented Aug 5, 2021 •

edited by lukeelmers

Loading