Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Migrations] Update all aliases with a single updateAliases() when relocating SO documents #158940

Merged
merged 9 commits into from
Jun 5, 2023

Conversation

gsoldevila
Copy link
Contributor

@gsoldevila gsoldevila commented Jun 2, 2023

Fixes #158733

The goal of this modification is to enforce migrators of all indices involved in a relocation (e.g. as part of the dot kibana split) to create the index aliases in the same updateAliases() call.

This way, either:

  • all the indices involved in the dot kibana split relocation will be completely upgraded (with the appropriate aliases).
  • or none of them will.

@gsoldevila gsoldevila added bug Fixes for quality problems that affect the customer experience Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc release_note:fix Feature:Migrations backport:prev-minor Backport to (8.x) the previous minor version (i.e. one version back from main) v8.9.0 v8.8.1 labels Jun 2, 2023
@@ -242,6 +249,12 @@ export const nextActionMap = (
}),
MARK_VERSION_INDEX_READY: (state: MarkVersionIndexReady) =>
Actions.updateAliases({ client, aliasActions: state.versionIndexReadyActions.value }),
MARK_VERSION_INDEX_READY_SYNC: (state: MarkVersionIndexReady) =>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's where the synchronization magic happens:

  • All migrators involved in a relocation will wait for the rest to reach this point.
  • They will then provide the aliases that they intend to update (in the payload property).
  • The updateRelocationAliases synchronisation object will then call the client.indices.updateAliases one single time.
  • Each of the migrators will receive the same response for the global update.

@@ -1458,7 +1470,9 @@ export const model = (currentState: State, resW: ResponseType<AllActionStates>):
// index.
return {
...stateP,
controlState: 'MARK_VERSION_INDEX_READY',
controlState: stateP.mustRelocateDocuments
? 'MARK_VERSION_INDEX_READY_SYNC'
Copy link
Contributor Author

@gsoldevila gsoldevila Jun 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All migrators involved in a relocation will update the aliases simultaneously.

return new Defer<T>();
}

export function createWaitGroupMap<T, U>(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed all "defer" objects to "waitGroups" following Pierre's suggestion (less confusing).

Copy link
Member

@afharo afharo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! However, I'm not a migration expert yet. So I'd leave the LGTM to someone else :)

throwDelayMillis: 1000, // another migrator has failed for a reason, let it take Kibana down and log its problem
};
} else {
throwBadResponse(stateP, left);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at the line below, should this be return throwBadResponse(...) (mind the return)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I didn't notice, but in the code there's a few of each. I suppose it does not matter, cause the throwBadResponse throws an error anyway. We can choose one and make it consistent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with the tight deadline, I think we should rather not try to decide on what form we prefer and make it consistent in this PR, typescript has us covered either way.

@@ -242,6 +249,12 @@ export const nextActionMap = (
}),
MARK_VERSION_INDEX_READY: (state: MarkVersionIndexReady) =>
Actions.updateAliases({ client, aliasActions: state.versionIndexReadyActions.value }),
MARK_VERSION_INDEX_READY_SYNC: (state: MarkVersionIndexReady) =>
Actions.synchronizeMigrators({
waitGroup: updateRelocationAliases,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Could it happen that 1 alias creation fails (while the others succeed) and it triggers a whole re-run of the migrations on the next restart?

Should we create all aliases in the same call?

Comment on lines +108 to +111
updateAliases({
client: options.elasticsearchClient,
aliasActions: allAliasActions.flat(),
})()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh! this is the response to my previous comment, right? 😅

Copy link
Contributor Author

@gsoldevila gsoldevila Jun 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct!

There's a single call to update them all.
In fact, they all await on something like:

  (Promise.all([migrator1, migrator2, migratorN]).then(updateAliases))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the simplest solutions I came up with, but perhaps we can create a separate issue to refactor and make something more centralised, some sort of SynchronizationManager.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same "Oh!" moment 😅 I would not come here to look for an updateAliases action call. I think it's fine to merge as-is and rather spend extra time on testing.

But I think it would be worth exploring if we could make this less "surprising". One way might be to let every migrator call the updateAliases action. Before we had 6 update aliases calls with each call doing one index/alias. Now we'd have 1 batch call and 5 no-ops.

Another option could be to use the synchronizeMigrators.then hook but only let the .kibana migrator do the updateAliases call, other migrators have a no-op then hook.

@@ -1474,9 +1488,19 @@ export const model = (currentState: State, resW: ResponseType<AllActionStates>):
} else if (stateP.controlState === 'CREATE_NEW_TARGET') {
const res = resW as ExcludeRetryableEsError<ResponseType<typeof stateP.controlState>>;
if (Either.isRight(res)) {
if (res.right === 'index_already_exists') {
Copy link
Contributor Author

@gsoldevila gsoldevila Jun 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This the second fix of the PR:
If we are creating a new index but the index already exists:

  • it probably belongs to a previous failed upgrade
    • which managed to create the index
    • but failed before it could create the aliases

In this scenario, instead of simply completing the migration, we will attempt to update mappings first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering why index_already_exists was a right response.
Strictly speaking, if the index exists, and it shouldn't, then the previous migration failure would be an error state, i.e. a left response). But I see what we're doing here, migrations skip over index creation and move on. Hence, it's a recoverable 'unexpected' case.
It makes sense now, great!

() => {
if (!aliasActions || !aliasActions.length) throw Error('updating NO aliases!');
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, that was a debug statement, removing!

Copy link
Member

@lukeelmers lukeelmers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get some more eyes on it, but the sync process and the handling for index_already_exists all make sense to me. Thanks for jumping on this so quickly @gsoldevila ❤️

return { ...stateP, controlState: 'LEGACY_CREATE_REINDEX_TARGET' };
} else {
// @ts-expect-error TS doesn't correctly narrow this type to never
return throwBadResponse(stateP, res);
throwBadResponse(stateP, left);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume it was just a mistake that we were previously throwing res here instead of res.left, and that we aren't losing anything important in our error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this branch of the conditional, res should not contain anything apart from left.

I looked at the code that can throw this exception:
packages/core/saved-objects/core-saved-objects-migration-server-internal/src/actions/set_write_block.ts

And I can confirm, the only possible Left responses are IndexNotFound | RetryableEsClientError, none of them have anything else.
Finally, the throwBadResponse only JSON.stringifies the received object, so I believe it's completely safe to delete.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah we're just helping TS see what we already know by giving it the const left = res.left; hint.

Copy link
Contributor

@TinaHeiligers TinaHeiligers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM.

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Public APIs missing exports

Total count of every type that is part of your API that should be exported but is not. This will cause broken links in the API documentation system. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats exports for more detailed information.

id before after diff
@kbn/core-saved-objects-migration-server-internal 45 46 +1
Unknown metric groups

ESLint disabled line counts

id before after diff
enterpriseSearch 19 21 +2
securitySolution 414 418 +4
total +6

Total ESLint disabled count

id before after diff
enterpriseSearch 20 22 +2
securitySolution 498 502 +4
total +6

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@gsoldevila gsoldevila marked this pull request as ready for review June 4, 2023 17:03
@gsoldevila gsoldevila requested a review from a team as a code owner June 4, 2023 17:03
gsoldevila added a commit that referenced this pull request Jun 7, 2023
This PR addresses remarks and feedback from
#158940, which was part of an
emergency release.
gsoldevila added a commit that referenced this pull request Jun 7, 2023
This PR adds #158733 to the list
of known issues:
* issue: #158733
* pull: #158940

---------

Co-authored-by: James Rodewig <[email protected]>
jrodewig added a commit that referenced this pull request Jun 7, 2023
#159221)

# Backport

This will backport the following commits from `8.8` to `main`:
- [[DOCS+] Add #158940 to the list of 8.8.0 known issues
(#159197)](#159197)

<!--- Backport version: 8.9.7 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Gerard
Soldevila","email":"[email protected]"},"sourceCommit":{"committedDate":"2023-06-07T13:16:53Z","message":"[DOCS+]
Add #158940 to the list of 8.8.0 known issues (#159197)\n\nThis PR adds
#158733 to the list\r\nof known
issues:\r\n* issue: https://github.com/elastic/kibana/issues/158733\r\n*
pull:
https://github.com/elastic/kibana/pull/158940\r\n\r\n---------\r\n\r\nCo-authored-by:
James Rodewig
<[email protected]>","sha":"528671e3bdcf65856c52cb48bbfaec231bdbaca3","branchLabelMapping":{"^v8.8.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["Team:Docs","release_note:skip","docs","Feature:Migrations"],"number":159197,"url":"https://github.com/elastic/kibana/pull/159197","mergeCommit":{"message":"[DOCS+]
Add #158940 to the list of 8.8.0 known issues (#159197)\n\nThis PR adds
#158733 to the list\r\nof known
issues:\r\n* issue: https://github.com/elastic/kibana/issues/158733\r\n*
pull:
https://github.com/elastic/kibana/pull/158940\r\n\r\n---------\r\n\r\nCo-authored-by:
James Rodewig
<[email protected]>","sha":"528671e3bdcf65856c52cb48bbfaec231bdbaca3"}},"sourceBranch":"8.8","suggestedTargetBranches":[],"targetPullRequestStates":[]}]
BACKPORT-->

Co-authored-by: Gerard Soldevila <[email protected]>
@rudolf rudolf added the Epic:ScaleMigrations Scale upgrade migrations to millions of saved objects label Sep 23, 2023
@kibanamachine
Copy link
Contributor

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add the label auto-backport or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 158940 locally

@kibanamachine kibanamachine added the backport missing Added to PRs automatically when the are determined to be missing a backport. label May 22, 2024
@kibanamachine
Copy link
Contributor

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add the label auto-backport or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 158940 locally

15 similar comments
@kibanamachine
Copy link
Contributor

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add the label auto-backport or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 158940 locally

@kibanamachine
Copy link
Contributor

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add the label auto-backport or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 158940 locally

@kibanamachine
Copy link
Contributor

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add the label auto-backport or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 158940 locally

@kibanamachine
Copy link
Contributor

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add the label auto-backport or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 158940 locally

@kibanamachine
Copy link
Contributor

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add the label auto-backport or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 158940 locally

@kibanamachine
Copy link
Contributor

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add the label auto-backport or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 158940 locally

@kibanamachine
Copy link
Contributor

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add the label auto-backport or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 158940 locally

@kibanamachine
Copy link
Contributor

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add the label auto-backport or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 158940 locally

@kibanamachine
Copy link
Contributor

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add the label auto-backport or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 158940 locally

@kibanamachine
Copy link
Contributor

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add the label auto-backport or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 158940 locally

@kibanamachine
Copy link
Contributor

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add the label auto-backport or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 158940 locally

@kibanamachine
Copy link
Contributor

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add the label auto-backport or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 158940 locally

@kibanamachine
Copy link
Contributor

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add the label auto-backport or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 158940 locally

@kibanamachine
Copy link
Contributor

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add the label auto-backport or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 158940 locally

@kibanamachine
Copy link
Contributor

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add the label auto-backport or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 158940 locally

@jbudz jbudz added the backport:skip This commit does not require backporting label Sep 30, 2024
@kibanamachine kibanamachine removed the backport missing Added to PRs automatically when the are determined to be missing a backport. label Sep 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:prev-minor Backport to (8.x) the previous minor version (i.e. one version back from main) backport:skip This commit does not require backporting bug Fixes for quality problems that affect the customer experience Epic:ScaleMigrations Scale upgrade migrations to millions of saved objects Feature:Migrations release_note:fix Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc v8.8.1 v8.9.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8.8.0 migrations temporary ES failures could cause permanent migration failure
9 participants