-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC Improve saved object migrations algorithm #84333
Conversation
Pinging @elastic/kibana-core (Team:Core) |
{ "remove_index": { "index": ".kibana" } } | ||
{ "add": { "index": ".kibana_pre6.5.0_001", "alias": ".kibana" } }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the order matters, both operations need to succeed or both fail. But just felt like this order makes it easier to read since we only want to add the alias if remove_index succeeds.
9. Mark the migration as complete. This is done as a single atomic operation | ||
(requires https://github.com/elastic/elasticsearch/pull/58100) to | ||
guarantee when multiple versions of Kibana are performing the migration in | ||
parallel, only one version will win. E.g. if 7.11 and 7.12 are started in | ||
parallel and migrate from a 7.9 index, either 7.11 or 7.12 should succeed | ||
and accept writes, but not both. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just fixed a grammar mistake guarantees
-> guarantee
🤓 no real changes here
the legacy index. Use a fixed index name i.e `.kibana_pre6.5.0_001` or | ||
`.kibana_task_manager_pre7.4.0_001`. Ignore index already exists errors. | ||
3. Reindex the legacy index into the new source index with the | ||
`convertToAlias` script if specified. Use `wait_for_completion: false` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the convertToAlias
script use deterministic IDs already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's deterministic, we're just appending the type
infront of the existing ID to make it SO compatible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be advantageous to only use reindexing when there is a migration with convertToAlias
script that needs to be applied?
My understanding is that index cloning had some significant performance advantages for larger indices. This performance difference could be important as we try to accommodate more use cases in the SO index (eg. SIEM exception lists)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reindex will only be applied once when a user upgrades from a legacy index and after that we'll always be able to clone the index. So this really only affects users the first time they upgrade from < 6.5 and won't have a future impact when in later versions our indices start growing.
It's possible to make this optimization, but it makes the algorithm somewhat more complex so it feels like the simplicity and consistency is worth it for the relatively small performance impact.
3. That belong to a type whose mappings were changed by comparing the `migrationMappingPropertyHashes`. (Metadata, unlike the mappings isn't commutative, so there is a small chance that the metadata hashes do not accurately reflect the latest mappings, however, this will just result in an less efficient query). | ||
6. Create a target index with `dynamic: false` on the top-level mappings so that any kind of document can be written to the index. This allows us to write untransformed documents to the index which might have fields which have been removed from the latest mappings defined by the plugin. Define `dynamic:true` mappings for the `migrationVersion` field so that we're still able to search for outdated documents that need to be transformed. | ||
1. Ignore errors if the target index already exists. | ||
7. Reindex the source index into a the new target index. All nodes on the same version will use the same fixed index name e.g. `.kibana_7.10.0_001`. The `001` postfix isn't used by Kibana, but allows for re-indexing an index should this be required by an Elasticsearch upgrade. E.g. re-index `.kibana_7.10.0_001` into `.kibana_7.10.0_002` and point the `.kibana_7.10.0` alias to `.kibana_7.10.0_002`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a risk of lost deletes here.
- Two instances start the migration
- Instance (1) finishes the migration
3 Instance (2) is still busy with a reindex - Instance (1) deletes a document (and acknowledges the delete to the client)
- Instance (2)'s reindex operation re-creates the deleted document 💣 💥
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solved with reindex + clone
…g other reindex operations is idempotent The first version of the reindex block had only the instance which was able to mark the migration as complete set and remove the write block. This means other instances couldn't know if any reindex operaitons were in progress if the migration was already marked as complete. It also meant that a failure in this critical step could result in a permanent write block.
…reventing other reindex operations is idempotent" This reverts commit 8baf9b1.
…revent lost deletes" This reverts commit d7237ca.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intermediate temporary index seems to solve the lost deletes problem well. Overall, this LGTM, thanks @rudolf!
Co-authored-by: Josh Dover <[email protected]>
@elasticmachine merge upstream |
* master: (99 commits) [Fleet] Use Fleet Server indices in the search bar (elastic#90835) [Search Sessions] added an info flyout to session management (elastic#90559) [ILM] Revisit searchable snapshot field after new redesign (elastic#90793) [Alerting] License Errors on Alert List View (elastic#89920) RFC Improve saved object migrations algorithm (elastic#84333) [Lens] (Accessibility) Fix focus on drag and drop actions (elastic#90561) Use new shortcut links to Fleet discuss forums. (elastic#90786) Do not generate an ephemeral encryption key in production. (elastic#81511) [Fleet] Use staging registry for snapshot builds (elastic#90327) Actually deleting x-pack/tsconfig.refs.json (elastic#90898) Add deprecation warning to all Beats CM pages. (elastic#90741) skip flaky suite (elastic#90136) Revert "Revert "[Metrics UI] Add Metrics Anomaly Alert Type (elastic#89244)"" (elastic#90889) remove ref to removed tsconfig file [core.logging] Uses host timezone as default (elastic#90368) [Maps] remove maps_file_upload plugin and fold public folder into file_upload plugin (elastic#90292) Revert "[Metrics UI] Add Metrics Anomaly Alert Type (elastic#89244)" [dev-utils/ci-stats] support disabling ship errors (elastic#90851) Prefix with / (elastic#90836) [Metrics UI] Add Metrics Anomaly Alert Type (elastic#89244) ...
Summary
1. Reindex the legacy index
Because the task manager
convertToAlias
script rewrites documents'_id
we cannot clone the legacy and transform the documents with a script, we have to reindex.2. Reindex for all migrations
We previously tested that mappings updates between versions of Kibana were compatible i.e. the mappings of a 7.1 index can be upgraded to 7.10 without changing the mappings. However with our recent efforts to reduce the field count we've changed several mappings like (
index: false
) which means mappings can no longer be updated without a reindex (and we didn't retest our assumption 💥 ).We could potentially just change the mappings, but I fear that there could be an incompatible mapping change lurking in any of the 6.x.x -> 7.x.x -> 7.11 upgrade paths. So it feels like the least risky option is to just reindex everything.
This has the downside that we're duplicating all data in
.kibana
on every patch whereas we would previously only do this when required (usually every minor). I feel like this is an acceptable tradeoff:clone
and if it has we can do a reindex. This would mean that for most patch releases we don't increase the storage costs and our solution has a similar "storage cost" to the old algorithm. However, for the 7.11.0 release this won't make a difference so this optimization can be targeted for 7.12.3. Reindex + clone to prevent lost deletes
When multiple nodes are performing a reindex we could end up with lost deletes:
To prevent this lost deleted, I've change the reindex into a reindex + clone step. Deletes will only occur against the cloned index which prevents other reindex operations from adding back the deleted document.
Checklist
Delete any items that are not applicable to this PR.
For maintainers