Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2 migrations needs to support re-writing document _id's to support saved-objects in multiple spaces #86247

Closed
rudolf opened this issue Dec 17, 2020 · 7 comments
Assignees
Labels
Feature:Saved Objects project:ResilientSavedObjectMigrations Reduce Kibana upgrade failures by making saved object migrations more resilient Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc

Comments

@rudolf
Copy link
Contributor

rudolf commented Dec 17, 2020

The algorithm described in https://github.com/elastic/kibana/blob/master/rfcs/text/0013_saved_object_migrations.md and implemented in #78413 does not allow migration functions to re-write document _id's.

Rewriting document _id's is required for enabling saved objects to be shared across multiple spaces in 8.0 because we need to remove the object's space from the _id. Similarly saved object types are also embedded in the _id so when a saved object type is renamed we need to change the _id to reflect the new type name.

Rewriting a document _id can only happen during a scripted reindex operation so the two potential solutions are:

  1. Add a script to the existing reindex step.
    1. To regenerate the _id (and referenced _id's) painless would have to add uuidV5 support
    2. How will we create new legacy URL alias objects for objects that had IDs regenerated?
  2. Do a "client-side reindex" by reading batches of documents from the old index and writing them into the new index. This is the simplest approach but probably has worse performance because:
    • a "client-side reindex" means each batch is slower because of the network latency between Kibana <-> Elasticsearch
    • although we already read, transform, write outdated documents, each release typically only contains a few saved object types with migrations, so the amount of outdated documents would usually be much smaller than the total amount of documents

There is a big risk that (2) would not scale well for millions of saved objects. Although we don't have any current use cases for plugins creating that many saved objects, making migrations future-proof would justify the additional work of adding uuidv5 to painless. Before deciding on the approach to follow we should do some benchmarking to see how well they scale for e.g. 50 million saved objects.

  1. Profile a scripted reindex of 50 million documents. Although we cannot accurately emulate the cpu load of generating a uuidv5 (which includes a hash function) we could just let the script generate a new UUID so that our benchmark at least takes into account the overhead of executing a script on each document.
  2. Profile a client-side reindex with different batch sizes (we can use v1 migrations for this migrations.enableV2=false, migrations.batchSize=x but to exclude the overhead of transforming documents, we should ensure that all but one document are already on the latest migrationVersion)
    1. As a percentage, how much overhead does the client-side reindex add for batches of 1k / 3k / 10k documents in an environment like Cloud?
@rudolf rudolf added Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Feature:Saved Objects project:ResilientSavedObjectMigrations Reduce Kibana upgrade failures by making saved object migrations more resilient labels Dec 17, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

@rudolf
Copy link
Contributor Author

rudolf commented Feb 24, 2021

This will hopefully also solve #91143

@rudolf
Copy link
Contributor Author

rudolf commented Mar 3, 2021

Updated the description based on #92933 (comment)

@rudolf
Copy link
Contributor Author

rudolf commented Mar 15, 2021

v2 migrations fixed #56731 by accident, but if we again move to a model where we read-transform-write all documents, we will likely bump into that issue again.

@rudolf
Copy link
Contributor Author

rudolf commented Mar 30, 2021

We should also keep #93155 in mind

@joshdover
Copy link
Contributor

@mshustov What else needs to be done for this issue?

@mshustov
Copy link
Contributor

The initial implementation added in #97222
I opened a separate issue #97965 for the leftovers, closing.

@rudolf rudolf changed the title Resilient migrations needs to support re-writing document _id's to support saved-objects in multiple spaces v2 migrations needs to support re-writing document _id's to support saved-objects in multiple spaces Aug 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Saved Objects project:ResilientSavedObjectMigrations Reduce Kibana upgrade failures by making saved object migrations more resilient Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc
Projects
None yet
Development

No branches or pull requests

4 participants