-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve migrations performance by using distinct indices per SO type #104081
Comments
Pinging @elastic/kibana-core (Team:Core) |
We should remember that the core usage collector for saved objects currently only collects saved objects per type for the "default saved objects index" https://github.com/elastic/kibana/blob/main/src/plugins/kibana_usage_collection/server/plugin.ts#L126 So we would need to refactor this collector to check the saved objects registry during |
Folllowing our discussions from yesterday, especially what @gsoldevila proposed, here's my brainstorm on the kind of trade-offs and optimization we could make: If we look at the use cases, I think we have two main situations:
Now, the main question I have is: Do we want/need to allow doing more than that? E.g would we want to support something like
My point being, if we define that:
I think it would significantly lower the needs in term of synchronization, as I think we could, as @gsoldevila suggested, have most of the work be performed in 'isolation' from a single migrator. I'm not speaking in term of technicals here (the migration could instantiate other migrators or something similar) but in term of responsibilities / concerns here, as each migrator wouldn't need to sync or interact with the others wdyt all? does that make sense? |
If we think about how all that fits into serverless, none of them are going to take place in the near future anymore. I was imagining the "after-split" functioning mode as a sort of v3 migrator that would bring us very close to serverless. A simplified one that only knows about updating mappings in place. With that in mind, I define 2 main scenarios:
This way, we can use the "splitting into separate indices" as an opportunity to take us a step closer to serverless. We'd also isolate the v3 logic in a separate "component", easier to maintain and update. I think this also fits well with Pierre's suggestion of having moves bound to a release. After giving it some thought, Pierre's proposal is more flexible as it would allow different moves between indices in the future, which we could turn to in case of facing issues like mapping explosions. Nevertheless, that would come at the cost of some downtime, and we would also be keeping some (a priori) unnecessary complexity in the code. |
Spoke to @gsoldevila and @jloleysens. We agreed that reducing the code complexity of the migrator (especially the model) would be good. At a high level we could split up v1/v2/v3 migrations into their own state machines. It would also be useful to have a state machine of state machines so that the (illustration from @gsoldevila) When we do such a refactoring is debatable. If type splitting doesn't require a lot of risky changes to the state machine that would add a lot of additional complexity it feels like we should push for reducing downtime first and then refactor once that's released. When it comes to what a v3 migrator might look like this is a basic way to visualise the proposal. A single migrator instance that does it's actions for all the indices it manages e.g. creates 3 temp indices, clones 3 temp indices performs the MARK_VERSION_INDEX_READY for 3 indices etc. Let's call independant migratiors proposal (1) and this proposal (2). Both approaches require injecting the In (1) the coordination happens between the migrators in (2) the coordination happens within a migrator. But in both cases the coordination is only necessary inside a single Kibana instance, we're not talking about distributed synchronisation across instances. I think (1) would require much less code changes though. We would leave all actions untouched but introduce two new steps The previous illustrations were kinda mixing the state machine I also agree with the previous comments that we don't need to design for a flexibly splitting strategy that can change if we decide to split types differently in the future. This is out of scope and if we can simplify the implementation without it that'd be fine. But I think it's possible without any further change (illustrated here with the orange arrow which moves a document out of the cases index and back to the .kibana index. |
## Description Fix #104081 This PR move some of the SO types from the `.kibana` index into the following ones: - `.kibana_alerting_cases` - `.kibana_analytics` - `.kibana_security_solution` - `.kibana_ingest` This split/reallocation will occur during the `8.8.0` Kibana upgrade (*meaning: from any version older than `8.8.0` to any version greater or equal to `8.8.0`*) **This PR main changes are:** - implement the changes required in the SO migration algorithm to support this reallocation - update the FTR tools (looking at you esArchiver) to support these new indices - update hardcoded references to `.kibana` and usage of the `core.savedObjects.getKibanaIndex()` to use new APIs to target the correct index/indices - update FTR datasets, tests and utility accordingly ## To reviewers **Overall estimated risk of regressions: low** But, still, please take the time to review changes in your code. The parts of the production code that were the most impacted are the telemetry collectors, as most of them were performing direct requests against the `.kibana` index, so we had to adapt them. Most other contributor-owned changes are in FTR tests and datasets. If you think a type is misplaced (either we missed some types that should be moved to a specific index, or some types were moved and shouldn't have been) please tell us, and we'll fix the reallocation either in this PR or in a follow-up. ## .Kibana split The following new indices are introduced by this PR, with the following SO types being moved to it. (any SO type not listed here will be staying in its current index) Note: The complete **_type => index_** breakdown is available in [this spreadsheet](https://docs.google.com/spreadsheets/d/1b_MG_E_aBksZ4Vkd9cVayij1oBpdhvH4XC8NVlChiio/edit#gid=145920788). #### `.kibana_alerting_cases` - action - action_task_params - alert - api_key_pending_invalidation - cases - cases-comments - cases-configure - cases-connector-mappings - cases-telemetry - cases-user-actions - connector_token - rules-settings - maintenance-window #### `.kibana_security_solution` - csp-rule-template - endpoint:user-artifact - endpoint:user-artifact-manifest - exception-list - exception-list-agnostic - osquery-manager-usage-metric - osquery-pack - osquery-pack-asset - osquery-saved-query - security-rule - security-solution-signals-migration - siem-detection-engine-rule-actions - siem-ui-timeline - siem-ui-timeline-note - siem-ui-timeline-pinned-event #### `.kibana_analytics` - canvas-element - canvas-workpad-template - canvas-workpad - dashboard - graph-workspace - index-pattern - kql-telemetry - lens - lens-ui-telemetry - map - search - search-session - search-telemetry - visualization #### `.kibana_ingest` - epm-packages - epm-packages-assets - fleet-fleet-server-host - fleet-message-signing-keys - fleet-preconfiguration-deletion-record - fleet-proxy - ingest_manager_settings - ingest-agent-policies - ingest-download-sources - ingest-outputs - ingest-package-policies ## Tasks / PRs ### Sub-PRs **Implementation** - 🟣 #154846 - 🟣 #154892 - 🟣 #154882 - 🟣 #154884 - 🟣 #155155 **Individual index split** - 🟣 #154897 - 🟣 #155129 - 🟣 #155140 - 🟣 #155130 ### Improvements / follow-ups - 👷🏼 Extract logic into [runV2Migration](#154151 (comment)) @gsoldevila - Make `getCurrentIndexTypesMap` resillient to intermittent failures #154151 (comment) - 🚧 Build a more structured [MigratorSynchronizer](#154151 (comment)) - 🟣 #155035 - 🟣 #155116 - 🟣 #155366 ## Reallocation tweaks Tweaks to the reallocation can be done after the initial merge, as long as it's done before the public release of 8.8 - `url` should get back to `.kibana` (see [comment](#154888 (comment))) ## Release Note For performance purposes, Kibana is now using more system indices to store its internal data. The following system indices will be created when upgrading to `8.8.0`: - `.kibana_alerting_cases` - `.kibana_analytics` - `.kibana_security_solution` - `.kibana_ingest` --------- Co-authored-by: pgayvallet <[email protected]> Co-authored-by: Christos Nasikas <[email protected]> Co-authored-by: kibanamachine <[email protected]> Co-authored-by: Georgii Gorbachev <[email protected]>
…locating SO documents (#158940) Fixes #158733 The goal of this modification is to enforce migrators of all indices involved in a relocation (e.g. as part of the [dot kibana split](#104081)) to create the index aliases in the same `updateAliases()` call. This way, either: * all the indices involved in the [dot kibana split](#104081) relocation will be completely upgraded (with the appropriate aliases). * or none of them will.
This is a follow-up on [https://elasticco.atlassian.net/browse/KBNA-7838|https://elasticco.atlassian.net/browse/KBNA-7838|smart-link]
In this phase we propose to split some saved object types out of the
.kibana
index into separate indices[https://github.com//issues/104081|https://github.com//issues/104081|smart-link]
While prior work can reduce downtime for some upgrades, if just one saved object type defines a migration all saved objects still need to be migrated. E.g. there might be no
cases
migrations defined but there is adashboard
migration which then requires us to migrate all 1mcases
. This change would mean we would only migrate the 1mcases
if there is acases
migration defined. While this reduces the average downtime per upgrade it does introduce unpredictability for users where some upgrades are fast and others cause 10minutes of downtime.We already thought a lot about splitting the kibana index into multiple ones, either one index by type, or potentially one index per solution (#90817). If we were to go into that direction, we could potentially increase the migration speed significantly, by only performing the migration process on the indices containing documents that need to be migrated.
As an example, let’s say that we got 3 SO types,
foo
,bar
anddolly
, respectively in the.kibana_foo_8_0_0
,.kibana_bar_8_0_0
and.kibana_dolly_8_0_0
indicesDuring the migration from v8.0.0 to v8.2.1, we detect that we only have migrations to apply to the
foo
type. We could then only apply the migration to the.kibana_foo_8_0_0
index, and just clone the other indices to their new version.We could even go further: If we keep inside an internal index the current version (name) of each index, we could potentially even avoid the cloning of the
.kibana_bar_8_0_0
and.kibana_dolly_8_0_0
indices, and just have the next version use the exact same indices.This could drastically reduce the average migration time, especially if we choose to have one index per type (as opposed to one index per grouping of types)
Note that this would have the additional advantage of handling the problematic of the increasing number of registered fields (and dodging once and for all the Damocles sword of hitting the limit), see #70471
The text was updated successfully, but these errors were encountered: