-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scale upgrade migrations to millions of saved objects #144035
Labels
Epic:ScaleMigrations
Scale upgrade migrations to millions of saved objects
Feature:Migrations
loe:x-large
Extra Large Level of Effort
Team:Core
Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc
Comments
rudolf
added
Team:Core
Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc
Feature:Migrations
labels
Oct 26, 2022
Pinging @elastic/kibana-core (Team:Core) |
gsoldevila
added a commit
that referenced
this issue
Nov 28, 2022
…145604) The goal of this PR is to reduce the startup times of Kibana server by improving the migration logic. Fixes #145743 Related #144035) The migration logic is run systematically at startup, whether the customers are upgrading or not. Historically, these steps have been very quick, but we recently found out about some customers that have more than **one million** Saved Objects stored, making the overall startup process slow, even when there are no migrations to perform. This PR specifically targets the case where there are no migrations to perform, aka a Kibana node is started against an ES cluster that is already up to date wrt stack version and list of plugins. In this scenario, we aim at skipping the `UPDATE_TARGET_MAPPINGS` step of the migration logic, which internally runs the `updateAndPickupMappings` method, which turns out to be expensive if the system indices contain lots of SO. I locally tested the following scenarios too: - **Fresh install.** The step is not even run, as the `.kibana` index did not exist ✅ - **Stack version + list of plugins up to date.** Simply restarting Kibana after the fresh install. The step is run and leads to `DONE`, as the md5 hashes match those stored in `.kibana._mapping._meta` ✅ - **Faking re-enabling an old plugin.** I manually removed one of the MD5 hashes from the stored .kibana._mapping._meta through `curl`, and then restarted Kibana. The step is run and leads to `UPDATE_TARGET_MAPPINGS` as it used to before the PR ✅ - **Faking updating a plugin.** Same as the previous one, but altering an existing md5 stored in the metas. ✅ And that is the curl command used to tamper with the stored _meta: ```bash curl -X PUT "kibana:changeme@localhost:9200/.kibana/_mapping?pretty" -H 'Content-Type: application/json' -d' { "_meta": { "migrationMappingPropertyHashes": { "references": "7997cf5a56cc02bdc9c93361bde732b0", } } } ' ```
gsoldevila
added a commit
to gsoldevila/kibana
that referenced
this issue
Nov 29, 2022
…lastic#145604) The goal of this PR is to reduce the startup times of Kibana server by improving the migration logic. Fixes elastic#145743 Related elastic#144035) The migration logic is run systematically at startup, whether the customers are upgrading or not. Historically, these steps have been very quick, but we recently found out about some customers that have more than **one million** Saved Objects stored, making the overall startup process slow, even when there are no migrations to perform. This PR specifically targets the case where there are no migrations to perform, aka a Kibana node is started against an ES cluster that is already up to date wrt stack version and list of plugins. In this scenario, we aim at skipping the `UPDATE_TARGET_MAPPINGS` step of the migration logic, which internally runs the `updateAndPickupMappings` method, which turns out to be expensive if the system indices contain lots of SO. I locally tested the following scenarios too: - **Fresh install.** The step is not even run, as the `.kibana` index did not exist ✅ - **Stack version + list of plugins up to date.** Simply restarting Kibana after the fresh install. The step is run and leads to `DONE`, as the md5 hashes match those stored in `.kibana._mapping._meta` ✅ - **Faking re-enabling an old plugin.** I manually removed one of the MD5 hashes from the stored .kibana._mapping._meta through `curl`, and then restarted Kibana. The step is run and leads to `UPDATE_TARGET_MAPPINGS` as it used to before the PR ✅ - **Faking updating a plugin.** Same as the previous one, but altering an existing md5 stored in the metas. ✅ And that is the curl command used to tamper with the stored _meta: ```bash curl -X PUT "kibana:changeme@localhost:9200/.kibana/_mapping?pretty" -H 'Content-Type: application/json' -d' { "_meta": { "migrationMappingPropertyHashes": { "references": "7997cf5a56cc02bdc9c93361bde732b0", } } } ' ``` (cherry picked from commit b1e18a0) # Conflicts: # packages/core/saved-objects/core-saved-objects-migration-server-internal/src/actions/index.ts
gsoldevila
referenced
this issue
Nov 30, 2022
…ble (#145604) (#146637) # Backport This will backport the following commits from `main` to `8.6`: - [Reduce startup time by skipping update mappings step when possible (#145604)](#145604) <!--- Backport version: 8.9.7 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Gerard Soldevila","email":"[email protected]"},"sourceCommit":{"committedDate":"2022-11-28T14:34:58Z","message":"Reduce startup time by skipping update mappings step when possible (#145604)\n\nThe goal of this PR is to reduce the startup times of Kibana server by\r\nimproving the migration logic.\r\n\r\nFixes https://github.com/elastic/kibana/issues/145743\r\nRelated https://github.com/elastic/kibana/issues/144035)\r\n\r\nThe migration logic is run systematically at startup, whether the\r\ncustomers are upgrading or not.\r\nHistorically, these steps have been very quick, but we recently found\r\nout about some customers that have more than **one million** Saved\r\nObjects stored, making the overall startup process slow, even when there\r\nare no migrations to perform.\r\n\r\nThis PR specifically targets the case where there are no migrations to\r\nperform, aka a Kibana node is started against an ES cluster that is\r\nalready up to date wrt stack version and list of plugins.\r\n\r\nIn this scenario, we aim at skipping the `UPDATE_TARGET_MAPPINGS` step\r\nof the migration logic, which internally runs the\r\n`updateAndPickupMappings` method, which turns out to be expensive if the\r\nsystem indices contain lots of SO.\r\n\r\n\r\nI locally tested the following scenarios too:\r\n\r\n- **Fresh install.** The step is not even run, as the `.kibana` index\r\ndid not exist ✅\r\n- **Stack version + list of plugins up to date.** Simply restarting\r\nKibana after the fresh install. The step is run and leads to `DONE`, as\r\nthe md5 hashes match those stored in `.kibana._mapping._meta` ✅\r\n- **Faking re-enabling an old plugin.** I manually removed one of the\r\nMD5 hashes from the stored .kibana._mapping._meta through `curl`, and\r\nthen restarted Kibana. The step is run and leads to\r\n`UPDATE_TARGET_MAPPINGS` as it used to before the PR ✅\r\n- **Faking updating a plugin.** Same as the previous one, but altering\r\nan existing md5 stored in the metas. ✅\r\n\r\nAnd that is the curl command used to tamper with the stored _meta:\r\n```bash\r\ncurl -X PUT \"kibana:changeme@localhost:9200/.kibana/_mapping?pretty\" -H 'Content-Type: application/json' -d'\r\n{\r\n \"_meta\": {\r\n \"migrationMappingPropertyHashes\": {\r\n \"references\": \"7997cf5a56cc02bdc9c93361bde732b0\",\r\n }\r\n }\r\n}\r\n'\r\n```","sha":"b1e18a0414ed99456706119d15173b687c6e7366","branchLabelMapping":{"^v8.7.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["Team:Core","enhancement","release_note:skip","Feature:Migrations","backport:prev-minor","v8.7.0"],"number":145604,"url":"https://github.com/elastic/kibana/pull/145604","mergeCommit":{"message":"Reduce startup time by skipping update mappings step when possible (#145604)\n\nThe goal of this PR is to reduce the startup times of Kibana server by\r\nimproving the migration logic.\r\n\r\nFixes https://github.com/elastic/kibana/issues/145743\r\nRelated https://github.com/elastic/kibana/issues/144035)\r\n\r\nThe migration logic is run systematically at startup, whether the\r\ncustomers are upgrading or not.\r\nHistorically, these steps have been very quick, but we recently found\r\nout about some customers that have more than **one million** Saved\r\nObjects stored, making the overall startup process slow, even when there\r\nare no migrations to perform.\r\n\r\nThis PR specifically targets the case where there are no migrations to\r\nperform, aka a Kibana node is started against an ES cluster that is\r\nalready up to date wrt stack version and list of plugins.\r\n\r\nIn this scenario, we aim at skipping the `UPDATE_TARGET_MAPPINGS` step\r\nof the migration logic, which internally runs the\r\n`updateAndPickupMappings` method, which turns out to be expensive if the\r\nsystem indices contain lots of SO.\r\n\r\n\r\nI locally tested the following scenarios too:\r\n\r\n- **Fresh install.** The step is not even run, as the `.kibana` index\r\ndid not exist ✅\r\n- **Stack version + list of plugins up to date.** Simply restarting\r\nKibana after the fresh install. The step is run and leads to `DONE`, as\r\nthe md5 hashes match those stored in `.kibana._mapping._meta` ✅\r\n- **Faking re-enabling an old plugin.** I manually removed one of the\r\nMD5 hashes from the stored .kibana._mapping._meta through `curl`, and\r\nthen restarted Kibana. The step is run and leads to\r\n`UPDATE_TARGET_MAPPINGS` as it used to before the PR ✅\r\n- **Faking updating a plugin.** Same as the previous one, but altering\r\nan existing md5 stored in the metas. ✅\r\n\r\nAnd that is the curl command used to tamper with the stored _meta:\r\n```bash\r\ncurl -X PUT \"kibana:changeme@localhost:9200/.kibana/_mapping?pretty\" -H 'Content-Type: application/json' -d'\r\n{\r\n \"_meta\": {\r\n \"migrationMappingPropertyHashes\": {\r\n \"references\": \"7997cf5a56cc02bdc9c93361bde732b0\",\r\n }\r\n }\r\n}\r\n'\r\n```","sha":"b1e18a0414ed99456706119d15173b687c6e7366"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v8.7.0","labelRegex":"^v8.7.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/145604","number":145604,"mergeCommit":{"message":"Reduce startup time by skipping update mappings step when possible (#145604)\n\nThe goal of this PR is to reduce the startup times of Kibana server by\r\nimproving the migration logic.\r\n\r\nFixes https://github.com/elastic/kibana/issues/145743\r\nRelated https://github.com/elastic/kibana/issues/144035)\r\n\r\nThe migration logic is run systematically at startup, whether the\r\ncustomers are upgrading or not.\r\nHistorically, these steps have been very quick, but we recently found\r\nout about some customers that have more than **one million** Saved\r\nObjects stored, making the overall startup process slow, even when there\r\nare no migrations to perform.\r\n\r\nThis PR specifically targets the case where there are no migrations to\r\nperform, aka a Kibana node is started against an ES cluster that is\r\nalready up to date wrt stack version and list of plugins.\r\n\r\nIn this scenario, we aim at skipping the `UPDATE_TARGET_MAPPINGS` step\r\nof the migration logic, which internally runs the\r\n`updateAndPickupMappings` method, which turns out to be expensive if the\r\nsystem indices contain lots of SO.\r\n\r\n\r\nI locally tested the following scenarios too:\r\n\r\n- **Fresh install.** The step is not even run, as the `.kibana` index\r\ndid not exist ✅\r\n- **Stack version + list of plugins up to date.** Simply restarting\r\nKibana after the fresh install. The step is run and leads to `DONE`, as\r\nthe md5 hashes match those stored in `.kibana._mapping._meta` ✅\r\n- **Faking re-enabling an old plugin.** I manually removed one of the\r\nMD5 hashes from the stored .kibana._mapping._meta through `curl`, and\r\nthen restarted Kibana. The step is run and leads to\r\n`UPDATE_TARGET_MAPPINGS` as it used to before the PR ✅\r\n- **Faking updating a plugin.** Same as the previous one, but altering\r\nan existing md5 stored in the metas. ✅\r\n\r\nAnd that is the curl command used to tamper with the stored _meta:\r\n```bash\r\ncurl -X PUT \"kibana:changeme@localhost:9200/.kibana/_mapping?pretty\" -H 'Content-Type: application/json' -d'\r\n{\r\n \"_meta\": {\r\n \"migrationMappingPropertyHashes\": {\r\n \"references\": \"7997cf5a56cc02bdc9c93361bde732b0\",\r\n }\r\n }\r\n}\r\n'\r\n```","sha":"b1e18a0414ed99456706119d15173b687c6e7366"}}]}] BACKPORT-->
This was referenced Dec 13, 2022
rudolf
added
the
Epic:ScaleMigrations
Scale upgrade migrations to millions of saved objects
label
Jan 17, 2023
exalate-issue-sync
bot
changed the title
Scale upgrade migrations to millions of saved objects
Scale upgrade migrations to millions of saved objects: phase 1, limit migrations
Feb 10, 2023
exalate-issue-sync
bot
changed the title
Scale upgrade migrations to millions of saved objects: phase 1, limit migrations
Scale upgrade migrations to millions of saved objects: phase 1-3, limit migrations
Feb 10, 2023
Is this safe to close now that (4) and (5) are done? |
rudolf
changed the title
Scale upgrade migrations to millions of saved objects: phase 1-3, limit migrations
Scale upgrade migrations to millions of saved objects
Sep 23, 2023
Yes, once users upgrade to 8.8 subsequent upgrade migrations should be a lot faster and more scalable. |
27 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Epic:ScaleMigrations
Scale upgrade migrations to millions of saved objects
Feature:Migrations
loe:x-large
Extra Large Level of Effort
Team:Core
Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc
When designing the v2 migration algorithm our objective was to have less than 10 minutes of downtime for 100k saved objects.
We have since learned that some customers have clusters with millions of saved objects, approaching 10m saved objects (11k spaces with 3.7m visualisations).
Compounding this problem is that some of these customers use alerting in a way that makes them very sensitive to downtime.
This issue describes our plans for reducing the downtime of upgrade and startup migrations for clusters at this scale.
Phases:
We will do this by comparing the md5sums of the fields for each of the saved object types. If the md5sums match we will not perform the
UPDATE_TARGET_MAPPINGS
step and the associatedupdateAndPickupMappings
action. (We will still run theOUTDATED_DOCUMENTS_*
steps because sometimes documents can be outdated even when mappings weren't changed.) [https://github.com/reduce startup time by skipping update mappings step #145743|https://github.com/reduce startup time by skipping update mappings step #145743]convertToMultiNamespaceType
migrations #147344|https://github.com/Prevent futureconvertToMultiNamespaceType
migrations #147344]OUTDATED_DOCUMENTS_*
steps [https://github.com/Only migrate an index if necessary #124946|https://github.com/Only migrate an index if necessary #124946]Before:
After:
Further changes proposed are covered by
.kibana
index into separate indices [https://elasticco.atlassian.net/browse/KBNA-4545|https://elasticco.atlassian.net/browse/KBNA-4545|smart-link] [dot-kibana-split] Allow relocating SO to different indices during migration #154846While (2) can reduce downtime for some upgrades, if just one saved object type defines a migration all saved objects still need to be migrated. E.g. there might be no
cases
migrations defined but there is adashboard
migration which then requires us to migrate all 1mcases
. This change would mean we would only migrate the 1mcases
if there is acases
migration defined. While this reduces the average downtime per upgrade it does introduce unpredictability for users where some upgrades are fast and others cause 10minutes of downtime.[https://elasticco.atlassian.net/browse/KBNA-9053|https://elasticco.atlassian.net/browse/KBNA-9053|smart-link] #149326
The text was updated successfully, but these errors were encountered: