-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8.8.0 migrations temporary ES failures could cause permanent migration failure #158733
Comments
Pinging @elastic/kibana-core (Team:Core) |
Updated the issue to include the log entries that if present, would indicate potential data loss. |
@rudolf; @gsoldevila and I discussed that the count might be false in case some documents are deleted during migration (disabled plugins, unused, etc) |
I think we should, but we might want to wait a few more days until we know with 100% certainty that the fix will land in 8.8.1, that way we can give clear direction in the docs. |
…locating SO documents (#158940) Fixes #158733 The goal of this modification is to enforce migrators of all indices involved in a relocation (e.g. as part of the [dot kibana split](#104081)) to create the index aliases in the same `updateAliases()` call. This way, either: * all the indices involved in the [dot kibana split](#104081) relocation will be completely upgraded (with the appropriate aliases). * or none of them will.
## Summary Adds a test for #158733. This is based on the un-merged #158940, so see the last commit [#6eafe910424414b5670e5f325accc59d87dd6dc4](6eafe91) for the actual changes proposed by this PR ### Checklist Delete any items that are not applicable to this PR. - [ ] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md) - [ ] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [ ] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [ ] Any UI touched in this PR is usable by keyboard only (learn more about [keyboard accessibility](https://webaim.org/techniques/keyboard/)) - [ ] Any UI touched in this PR does not create any new axe failures (run axe in browser: [FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/), [Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US)) - [ ] If a plugin configuration key changed, check if it needs to be allowlisted in the cloud and added to the [docker list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker) - [ ] This renders correctly on smaller devices using a responsive layout. (You can test this [in your browser](https://www.browserstack.com/guide/responsive-testing-on-local-server)) - [ ] This was checked for [cross-browser compatibility](https://www.elastic.co/support/matrix#matrix_browsers) ### Risk Matrix Delete this section if it is not applicable to this PR. Before closing this PR, invite QA, stakeholders, and other developers to identify risks that should be tested prior to the change/feature release. When forming the risk matrix, consider some of the following examples and how they may potentially impact the change: | Risk | Probability | Severity | Mitigation/Notes | |---------------------------|-------------|----------|-------------------------| | Multiple Spaces—unexpected behavior in non-default Kibana Space. | Low | High | Integration tests will verify that all features are still supported in non-default Kibana Space and when user switches between spaces. | | Multiple nodes—Elasticsearch polling might have race conditions when multiple Kibana nodes are polling for the same tasks. | High | Low | Tasks are idempotent, so executing them multiple times will not result in logical error, but will degrade performance. To test for this case we add plenty of unit tests around this logic and document manual testing procedure. | | Code should gracefully handle cases when feature X or plugin Y are disabled. | Medium | High | Unit tests will verify that any feature flag or plugin combination still results in our service operational. | | [See more potential risk examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) | ### For maintainers - [ ] This was checked for breaking API changes and was [labeled appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) --------- Co-authored-by: Gerard Soldevila <[email protected]>
## Summary Adds a test for elastic#158733. This is based on the un-merged elastic#158940, so see the last commit [#6eafe910424414b5670e5f325accc59d87dd6dc4](elastic@6eafe91) for the actual changes proposed by this PR ### Checklist Delete any items that are not applicable to this PR. - [ ] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md) - [ ] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [ ] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [ ] Any UI touched in this PR is usable by keyboard only (learn more about [keyboard accessibility](https://webaim.org/techniques/keyboard/)) - [ ] Any UI touched in this PR does not create any new axe failures (run axe in browser: [FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/), [Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US)) - [ ] If a plugin configuration key changed, check if it needs to be allowlisted in the cloud and added to the [docker list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker) - [ ] This renders correctly on smaller devices using a responsive layout. (You can test this [in your browser](https://www.browserstack.com/guide/responsive-testing-on-local-server)) - [ ] This was checked for [cross-browser compatibility](https://www.elastic.co/support/matrix#matrix_browsers) ### Risk Matrix Delete this section if it is not applicable to this PR. Before closing this PR, invite QA, stakeholders, and other developers to identify risks that should be tested prior to the change/feature release. When forming the risk matrix, consider some of the following examples and how they may potentially impact the change: | Risk | Probability | Severity | Mitigation/Notes | |---------------------------|-------------|----------|-------------------------| | Multiple Spaces—unexpected behavior in non-default Kibana Space. | Low | High | Integration tests will verify that all features are still supported in non-default Kibana Space and when user switches between spaces. | | Multiple nodes—Elasticsearch polling might have race conditions when multiple Kibana nodes are polling for the same tasks. | High | Low | Tasks are idempotent, so executing them multiple times will not result in logical error, but will degrade performance. To test for this case we add plenty of unit tests around this logic and document manual testing procedure. | | Code should gracefully handle cases when feature X or plugin Y are disabled. | Medium | High | Unit tests will verify that any feature flag or plugin combination still results in our service operational. | | [See more potential risk examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) | ### For maintainers - [ ] This was checked for breaking API changes and was [labeled appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) --------- Co-authored-by: Gerard Soldevila <[email protected]> (cherry picked from commit 75ec1ec)
# Backport This will backport the following commits from `main` to `8.8`: - [Test for a failed clone during split migration (#158998)](#158998) <!--- Backport version: 8.9.7 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Rudolf Meijering","email":"[email protected]"},"sourceCommit":{"committedDate":"2023-06-05T09:24:28Z","message":"Test for a failed clone during split migration (#158998)\n\n## Summary\r\n\r\nAdds a test for #158733. This is based on the un-merged #158940, so see\r\nthe last commit\r\n[#6eafe910424414b5670e5f325accc59d87dd6dc4](https://github.com/elastic/kibana/commit/6eafe910424414b5670e5f325accc59d87dd6dc4)\r\nfor the actual changes proposed by this PR\r\n\r\n\r\n### Checklist\r\n\r\nDelete any items that are not applicable to this PR.\r\n\r\n- [ ] Any text added follows [EUI's writing\r\nguidelines](https://elastic.github.io/eui/#/guidelines/writing), uses\r\nsentence case text and includes [i18n\r\nsupport](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)\r\n- [ ]\r\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\r\nwas added for features that require explanation or tutorials\r\n- [ ] [Unit or functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere updated or added to match the most common scenarios\r\n- [ ] Any UI touched in this PR is usable by keyboard only (learn more\r\nabout [keyboard accessibility](https://webaim.org/techniques/keyboard/))\r\n- [ ] Any UI touched in this PR does not create any new axe failures\r\n(run axe in browser:\r\n[FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/),\r\n[Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US))\r\n- [ ] If a plugin configuration key changed, check if it needs to be\r\nallowlisted in the cloud and added to the [docker\r\nlist](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)\r\n- [ ] This renders correctly on smaller devices using a responsive\r\nlayout. (You can test this [in your\r\nbrowser](https://www.browserstack.com/guide/responsive-testing-on-local-server))\r\n- [ ] This was checked for [cross-browser\r\ncompatibility](https://www.elastic.co/support/matrix#matrix_browsers)\r\n\r\n\r\n### Risk Matrix\r\n\r\nDelete this section if it is not applicable to this PR.\r\n\r\nBefore closing this PR, invite QA, stakeholders, and other developers to\r\nidentify risks that should be tested prior to the change/feature\r\nrelease.\r\n\r\nWhen forming the risk matrix, consider some of the following examples\r\nand how they may potentially impact the change:\r\n\r\n| Risk | Probability | Severity | Mitigation/Notes |\r\n\r\n|---------------------------|-------------|----------|-------------------------|\r\n| Multiple Spaces—unexpected behavior in non-default Kibana Space.\r\n| Low | High | Integration tests will verify that all features are still\r\nsupported in non-default Kibana Space and when user switches between\r\nspaces. |\r\n| Multiple nodes—Elasticsearch polling might have race conditions\r\nwhen multiple Kibana nodes are polling for the same tasks. | High | Low\r\n| Tasks are idempotent, so executing them multiple times will not result\r\nin logical error, but will degrade performance. To test for this case we\r\nadd plenty of unit tests around this logic and document manual testing\r\nprocedure. |\r\n| Code should gracefully handle cases when feature X or plugin Y are\r\ndisabled. | Medium | High | Unit tests will verify that any feature flag\r\nor plugin combination still results in our service operational. |\r\n| [See more potential risk\r\nexamples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) |\r\n\r\n\r\n### For maintainers\r\n\r\n- [ ] This was checked for breaking API changes and was [labeled\r\nappropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)\r\n\r\n---------\r\n\r\nCo-authored-by: Gerard Soldevila <[email protected]>","sha":"75ec1ec7c3b78b3b9ff17874e2c3008079942abd","branchLabelMapping":{"^v8.9.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["Team:Core","release_note:skip","backport:prev-minor","Epic:KBNA-7838","v8.9.0"],"number":158998,"url":"https://github.com/elastic/kibana/pull/158998","mergeCommit":{"message":"Test for a failed clone during split migration (#158998)\n\n## Summary\r\n\r\nAdds a test for #158733. This is based on the un-merged #158940, so see\r\nthe last commit\r\n[#6eafe910424414b5670e5f325accc59d87dd6dc4](https://github.com/elastic/kibana/commit/6eafe910424414b5670e5f325accc59d87dd6dc4)\r\nfor the actual changes proposed by this PR\r\n\r\n\r\n### Checklist\r\n\r\nDelete any items that are not applicable to this PR.\r\n\r\n- [ ] Any text added follows [EUI's writing\r\nguidelines](https://elastic.github.io/eui/#/guidelines/writing), uses\r\nsentence case text and includes [i18n\r\nsupport](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)\r\n- [ ]\r\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\r\nwas added for features that require explanation or tutorials\r\n- [ ] [Unit or functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere updated or added to match the most common scenarios\r\n- [ ] Any UI touched in this PR is usable by keyboard only (learn more\r\nabout [keyboard accessibility](https://webaim.org/techniques/keyboard/))\r\n- [ ] Any UI touched in this PR does not create any new axe failures\r\n(run axe in browser:\r\n[FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/),\r\n[Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US))\r\n- [ ] If a plugin configuration key changed, check if it needs to be\r\nallowlisted in the cloud and added to the [docker\r\nlist](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)\r\n- [ ] This renders correctly on smaller devices using a responsive\r\nlayout. (You can test this [in your\r\nbrowser](https://www.browserstack.com/guide/responsive-testing-on-local-server))\r\n- [ ] This was checked for [cross-browser\r\ncompatibility](https://www.elastic.co/support/matrix#matrix_browsers)\r\n\r\n\r\n### Risk Matrix\r\n\r\nDelete this section if it is not applicable to this PR.\r\n\r\nBefore closing this PR, invite QA, stakeholders, and other developers to\r\nidentify risks that should be tested prior to the change/feature\r\nrelease.\r\n\r\nWhen forming the risk matrix, consider some of the following examples\r\nand how they may potentially impact the change:\r\n\r\n| Risk | Probability | Severity | Mitigation/Notes |\r\n\r\n|---------------------------|-------------|----------|-------------------------|\r\n| Multiple Spaces—unexpected behavior in non-default Kibana Space.\r\n| Low | High | Integration tests will verify that all features are still\r\nsupported in non-default Kibana Space and when user switches between\r\nspaces. |\r\n| Multiple nodes—Elasticsearch polling might have race conditions\r\nwhen multiple Kibana nodes are polling for the same tasks. | High | Low\r\n| Tasks are idempotent, so executing them multiple times will not result\r\nin logical error, but will degrade performance. To test for this case we\r\nadd plenty of unit tests around this logic and document manual testing\r\nprocedure. |\r\n| Code should gracefully handle cases when feature X or plugin Y are\r\ndisabled. | Medium | High | Unit tests will verify that any feature flag\r\nor plugin combination still results in our service operational. |\r\n| [See more potential risk\r\nexamples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) |\r\n\r\n\r\n### For maintainers\r\n\r\n- [ ] This was checked for breaking API changes and was [labeled\r\nappropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)\r\n\r\n---------\r\n\r\nCo-authored-by: Gerard Soldevila <[email protected]>","sha":"75ec1ec7c3b78b3b9ff17874e2c3008079942abd"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v8.9.0","labelRegex":"^v8.9.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/158998","number":158998,"mergeCommit":{"message":"Test for a failed clone during split migration (#158998)\n\n## Summary\r\n\r\nAdds a test for #158733. This is based on the un-merged #158940, so see\r\nthe last commit\r\n[#6eafe910424414b5670e5f325accc59d87dd6dc4](https://github.com/elastic/kibana/commit/6eafe910424414b5670e5f325accc59d87dd6dc4)\r\nfor the actual changes proposed by this PR\r\n\r\n\r\n### Checklist\r\n\r\nDelete any items that are not applicable to this PR.\r\n\r\n- [ ] Any text added follows [EUI's writing\r\nguidelines](https://elastic.github.io/eui/#/guidelines/writing), uses\r\nsentence case text and includes [i18n\r\nsupport](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)\r\n- [ ]\r\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\r\nwas added for features that require explanation or tutorials\r\n- [ ] [Unit or functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere updated or added to match the most common scenarios\r\n- [ ] Any UI touched in this PR is usable by keyboard only (learn more\r\nabout [keyboard accessibility](https://webaim.org/techniques/keyboard/))\r\n- [ ] Any UI touched in this PR does not create any new axe failures\r\n(run axe in browser:\r\n[FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/),\r\n[Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US))\r\n- [ ] If a plugin configuration key changed, check if it needs to be\r\nallowlisted in the cloud and added to the [docker\r\nlist](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)\r\n- [ ] This renders correctly on smaller devices using a responsive\r\nlayout. (You can test this [in your\r\nbrowser](https://www.browserstack.com/guide/responsive-testing-on-local-server))\r\n- [ ] This was checked for [cross-browser\r\ncompatibility](https://www.elastic.co/support/matrix#matrix_browsers)\r\n\r\n\r\n### Risk Matrix\r\n\r\nDelete this section if it is not applicable to this PR.\r\n\r\nBefore closing this PR, invite QA, stakeholders, and other developers to\r\nidentify risks that should be tested prior to the change/feature\r\nrelease.\r\n\r\nWhen forming the risk matrix, consider some of the following examples\r\nand how they may potentially impact the change:\r\n\r\n| Risk | Probability | Severity | Mitigation/Notes |\r\n\r\n|---------------------------|-------------|----------|-------------------------|\r\n| Multiple Spaces—unexpected behavior in non-default Kibana Space.\r\n| Low | High | Integration tests will verify that all features are still\r\nsupported in non-default Kibana Space and when user switches between\r\nspaces. |\r\n| Multiple nodes—Elasticsearch polling might have race conditions\r\nwhen multiple Kibana nodes are polling for the same tasks. | High | Low\r\n| Tasks are idempotent, so executing them multiple times will not result\r\nin logical error, but will degrade performance. To test for this case we\r\nadd plenty of unit tests around this logic and document manual testing\r\nprocedure. |\r\n| Code should gracefully handle cases when feature X or plugin Y are\r\ndisabled. | Medium | High | Unit tests will verify that any feature flag\r\nor plugin combination still results in our service operational. |\r\n| [See more potential risk\r\nexamples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) |\r\n\r\n\r\n### For maintainers\r\n\r\n- [ ] This was checked for breaking API changes and was [labeled\r\nappropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)\r\n\r\n---------\r\n\r\nCo-authored-by: Gerard Soldevila <[email protected]>","sha":"75ec1ec7c3b78b3b9ff17874e2c3008079942abd"}}]}] BACKPORT--> Co-authored-by: Rudolf Meijering <[email protected]>
This PR adds #158733 to the list of known issues: * issue: #158733 * pull: #158940 --------- Co-authored-by: James Rodewig <[email protected]>
#159221) # Backport This will backport the following commits from `8.8` to `main`: - [[DOCS+] Add #158940 to the list of 8.8.0 known issues (#159197)](#159197) <!--- Backport version: 8.9.7 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Gerard Soldevila","email":"[email protected]"},"sourceCommit":{"committedDate":"2023-06-07T13:16:53Z","message":"[DOCS+] Add #158940 to the list of 8.8.0 known issues (#159197)\n\nThis PR adds #158733 to the list\r\nof known issues:\r\n* issue: https://github.com/elastic/kibana/issues/158733\r\n* pull: https://github.com/elastic/kibana/pull/158940\r\n\r\n---------\r\n\r\nCo-authored-by: James Rodewig <[email protected]>","sha":"528671e3bdcf65856c52cb48bbfaec231bdbaca3","branchLabelMapping":{"^v8.8.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["Team:Docs","release_note:skip","docs","Feature:Migrations"],"number":159197,"url":"https://github.com/elastic/kibana/pull/159197","mergeCommit":{"message":"[DOCS+] Add #158940 to the list of 8.8.0 known issues (#159197)\n\nThis PR adds #158733 to the list\r\nof known issues:\r\n* issue: https://github.com/elastic/kibana/issues/158733\r\n* pull: https://github.com/elastic/kibana/pull/158940\r\n\r\n---------\r\n\r\nCo-authored-by: James Rodewig <[email protected]>","sha":"528671e3bdcf65856c52cb48bbfaec231bdbaca3"}},"sourceBranch":"8.8","suggestedTargetBranches":[],"targetPullRequestStates":[]}] BACKPORT--> Co-authored-by: Gerard Soldevila <[email protected]>
Usually, when Kibana migrations fail due to a temporary problem in Elasticsearch, Kibana is able to automatically succeed in finishing the migration when the failure condition is resolved.
When upgrading to 8.8.0 it's possible that a temporary Elasticsearch error (high disk watermark or circuit breaker exceptions) causes Kibana to permanently fail with an error like:
This can happen if:
[.kibana]
migrator finishes it's migration and completes the UPDATE_TARGET_MAPPINGS_META step[.kibana_alerting_cases]
are unable to successfully complete theCLONE_TEMP_TO_TARGET
stepThis causes an inconsistent state where the metadata in
.kibana
suggests the splitting migration had completed but it in fact had not.Mitigation
Potential data loss
Failed 8.8.0 migrations could lead to data loss under some very specific circumstances.
Detection
If the following criteria apply your cluster might have lost data, please contact support or revert to a snapshot from before the upgrade:
INIT -> CREATE_NEW_TARGET
entries.Overview
When upgrading to a new stack version, Kibana runs a migration logic to upgrade saved objects documents. Eventually, these objects are copied over to new version indices.
As part of the dot kibana split, when upgrading to 8.8.0 (or later), the migration logic will create a bunch of new indices, and distribute Saved Objects stored in
.kibana
across these new indices. If the migration succeeds only partially (e.g. some indices are completely migrated and others aren't), we can be in a situation where some of the saved objects from.kibana
index aren't properly copied over to the new version indices.Scenario A. The write_blocked indices
.kibana_alerting_cases
, is correctly created and contains all the saved objects that are intended to go into that index..kibana
index is NOT migrated..kibana
index is on a "before split" state, so Kibana determines it needs to do the split..kibana_alerting_cases
migrator will run a reindex flow, locking the existing.kibana_alerting_cases_8.8.0_001
.write_block
.Kibana upgrade process gets stuck on a bootloop.
Removing the
write_block
is pointless, as it will be re-created at each restart.The bootloop is not completely hard-locked though. If
.kibana
migrator manages to complete the migration process before the.kibana_alerting_cases
fails, Kibana will believe it is on an "after split" state, and the other indices won't bewrite_block
ed again. Manually removing thewrite_block
at this point:Scenario B. The silent data loss
.kibana
migrator finishes dispatching all the saved objects to their corresponding indices, which also includes the newer version of the.kibana
index itself, e.g..kibana_8.8.0_001
..kibana_alerting_cases
fails to clone from the temporary index.kibana_alerting_cases_8.8.0_reindex_temp
into the target index.kibana_alerting_cases_8.8.0_001
..kibana
index is aligned with current stack version, and it determines that there is no need to split..kibana_alerting_cases
migrator does not see its own index (it did not get to create the entrypoint aliases, aka.kibana_alerting_cases
and.kibana_alerting_cases_8.8.0
), so it assumes it is on a fresh deployment scenario. Depending on whether the first attempt failed:.kibana_alerting_cases_8.8.0_001
and completes the migration..kibana_<previousVersion>_001
, so from the SavedObjects API standpoint, they are effectively lost..kibana_alerting_cases_8.8.0_001
which already exists (no-op), perform a few updates, and finally create the entrypoint aliases. This scenario has 2 possible sub-branches:_mappings
are updated (most likely). The index created in the previous attempt won't have any mappings on it. The stored saved objects won't be indexed by any fields, and thus they won't be searchable_mappings
are updated (less likely). In this scenario, the documents should already be searchable, without any impact ✅ .Technical details
.kibana_analytics_8.8.0_reindex_temp
)..kibana_analytics_8.8.0_001
)..kibana_analytics
and.kibana_analytics_8.8.0
).In order to perform migrations, Kibana launches a "migrator" instance for each of the SO indices. These migrators run in parallel, and they handle the upgrade process described above.
Since the migrators run "independently", it might happen during an upgrade that one migrator succeeds and another one does not. This was an acceptable scenario up until 8.8.0, cause each index was truly independent of each other. If a migrator failed to migrate a specific index, it would simply retry next start.
In 8.8.0, we introduce dependencies between migrators:
.kibana
migrator must dispatch SO documents to other indices, and to do so, it must wait for other migrators to create their corresponding.kibana_<domain>_8.8.0_reindex_temp
temporary indices.There's one particularity that makes
.kibana
index special: it stores information about the type => index breakdown in the.kibana.mapping._meta.indexTypesMap
property. At startup, this information allows Kibana to determine if some types must be relocated into other indices during an upgrade.This is where the current issue lays:
.kibana
migrator is currently dispatching SO documents to other indices and then completing the rest of the migration process for its own index. However, upon successful migration, it will condition the behaviour of the migration logic on subsequent attempts (by storing the_meta.indexTypesMap
). Thus, it should make sure that all migrators have finished cloning + migrating their own indices before considering itself successful.The text was updated successfully, but these errors were encountered: