Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ZDT migration] Don't run document migration on non-migrator nodes #156345

Conversation

pgayvallet
Copy link
Contributor

@pgayvallet pgayvallet commented May 2, 2023

Summary

Part of #150309
A few enhancements to the ZDT migration algorithm.

1. Run the 'expand' phase (and only this one) on non-migrator nodes

Given our latests changes to the way we want the algo to function, the non-migrator nodes will have to run the 'expand' (schema expansion) phase. However, the document migration phase will have to be run by the migrator node exclusively.

Note: because it was required for integration tests, a new migration.zdt.runOnNonMigratorNodes option was introduced to change this behavor and have non-migrator nodes ignore this limitation.

2. Don't terminate during INIT if higher mapping versions are found

Any mapping changes are upward compatible, meaning that we can safely no-op instead of failing of the mapping version check result is lesser. This change is required now that mapping updates will be performed before all nodes of the previous version are shut down (and is also required for rollbacks)

3. Perform a version check during DOCUMENTS_UPDATE_INIT

We were always executing the full doc update cycle when entering this stage. We're now performing a version check similar to what was done during INIT.

If the check result returns:

  • greater: we perform the document migration (as it was done before this change)
  • equal: we skip the document migration
  • lesser: we skip the document migration (NOTE: this may change later depending on how we handle rollbacks)
  • conflict: we terminate with a failure, as done during INIT

@pgayvallet pgayvallet added Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc release_note:skip Skip the PR/issue when compiling release notes Feature:Migrations v8.9.0 labels May 2, 2023
@pgayvallet pgayvallet force-pushed the kbn-xxx-skip-doc-migration-for-non-mig-nodes branch from 117af1f to fd844be Compare May 2, 2023 13:20
pgayvallet added 2 commits May 2, 2023 15:21
…55981)

## Summary

Adapt the `zdt` migration algorithm to run the v2 migrations in addition
to the model version transformations.

The intent is to be able to use the current migration system in a
zero-downtime upgrade friendly-ish way

### Technicals  

elastic#148656 was a precondition, as the
zdt algo requires that mapping changes are compatible (given we keep the
same index).

Technically, it means we're now storing the `virtualVersion` instead of
the `modelVersion` in the index's meta (`mappingVersions` and
`docVersions`), to allow keeping track of mixed stack and model versions
per type.

Note that switching to model versioning is still a one way,
non-revertible action. Once using model versioning (by specifying
`switchToModelVersionAt` on your type), there is no going back.
@pgayvallet pgayvallet force-pushed the kbn-xxx-skip-doc-migration-for-non-mig-nodes branch from fd844be to 487bc49 Compare May 2, 2023 13:21
Copy link
Contributor Author

@pgayvallet pgayvallet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self review

Comment on lines +49 to +50
if (logger.isLevelEnabled('debug')) {
logger.debug<StateTransitionLogMeta>(logMessage, {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small improvement, since we implemented Logger.isLevelEnabled a while ago. Only log one message depending on the log level.

Comment on lines +30 to +34
const toContainLogEntries: MatcherFunction<[entries: string[], options: { ordered?: boolean }]> = (
actual,
entries,
options = {}
) => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I finally added it!

Comment on lines +75 to +83
// app version is lower than the index mapping version.
// Should only occur in case of rollback.
// For now, we don't do anything.
case 'lesser':
return {
...state,
logs,
controlState: 'DONE',
};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, we just perform a noop for downgrades and go straight to DONE. It may change in the future, depending on how we want to manage that, but it probably makes sense to just have a defined behavior for now, to be able to add tests around it.

Comment on lines +26 to +34
/**
* When true, will fully skip document migration, and will transition directly to DONE
* after the INDEX_STATE_UPDATE_DONE stage.
*
* This flag is set to `true` in the following scenarios:
* - on nodes without the `migrator` role, the flag will always be `true`.
* - if the migrator create the index, the workflow will set the flag to `true` given there is nothing to migrate.
*/
readonly skipDocumentMigration: boolean;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As documented - this flag replaces newIndexCreation

Comment on lines +16 to +17
const runDocumentMigration =
context.nodeRoles.migrator || context.migrationConfig.zdt.runOnNonMigratorNodes;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As explained on the PR's description: this zdt.runOnNonMigratorNodes was added to force non-migrator nodes to run the document migration. The only usage for now is integration tests, but we may want to use this in single-node dev / test environments.

Comment on lines 118 to +125
// app version is lower than the index mapping version.
// likely a rollback scenario - unsupported for the initial implementation
// either a rollback scenario, or an old node rebooting during the cohabitation period
// in that case, we simply no-op the expand phase.
case 'lesser':
return {
...state,
controlState: 'FATAL',
reason: 'Downgrading model version is currently unsupported',
logs,
controlState: 'INDEX_STATE_UPDATE_DONE',
...commonState,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change of behavior - encountering higher versions of the mappings in the index's meta now just skip the schema expansion phase

Comment on lines 21 to 34
describe('ZDT upgrades - basic downgrade', () => {
let esServer: TestElasticsearchUtils['es'];

const startElasticsearch = async () => {
const { startES } = createTestServers({
adjustTimeout: (t: number) => jest.setTimeout(t),
settings: {
es: {
license: 'basic',
},
},
});
return await startES();
};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a basic test asserting of the current behavior of the algo in case of 'downgrade' (booting with a lower version of the SO types)

@pgayvallet pgayvallet marked this pull request as ready for review May 3, 2023 13:17
@pgayvallet pgayvallet requested a review from a team as a code owner May 3, 2023 13:17
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

@@ -98,7 +99,7 @@ describe('migrationsStateActionMachine', () => {
abort,
});
const logs = loggingSystemMock.collect(mockLogger);
const doneLog = logs.info.splice(8, 1)[0][0];
const doneLog = logs.info.splice(4, 1)[0][0];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT could you add a little comment describing what you're doing with this obscure line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm doing nothing, just adapting someone else's hack :trollface:

describe('ZDT upgrades - basic downgrade', () => {
let esServer: TestElasticsearchUtils['es'];

const startElasticsearch = async () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT Since this function is very common, I made it available directly in the kibana_migrator_test_kit.ts (under the same name).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, great. Wasn't aware of that and TBH I thought about extracting it in this PR. Will adapt accordingly

Copy link
Contributor

@jloleysens jloleysens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall lgtm. Left a q about node roles. Did not test locally but test coverage looks great!

elasticsearchClient: elasticsearchClientMock.createElasticsearchClient(),
maxRetryAttempts: 15,
migrationDocLinks: docLinksServiceMock.createSetupContract().links.kibanaUpgradeSavedObjects,
typeRegistry,
serializer: serializerMock.create(),
deletedTypes: ['deleted-type'],
discardCorruptObjects: false,
nodeRoles: { migrator: true, ui: true, backgroundTasks: true },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our current config validation prohibits migrator being combined with other roles. I'm guessing we are just doing all roles true in this mock for convenience?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, pretty much. If we want to be more consistent, I can adapt that to { migrator: true, ui: false, backgroundTasks: false }, it shouldn't cause any test to fail. Will try


// check that the mappingVersions are still v2
expect(mappingMeta.mappingVersions).toEqual({
sample_type: '10.2.0',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignorant question: How / where is this mappingVersions 10.2.0 defined?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Model version 2 = virtual version 10.2.0.

Copy link
Contributor

@gsoldevila gsoldevila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with few nits and comments!

@@ -18,7 +18,7 @@ export const indexStateUpdateDone: ModelStage<
throwBadResponse(state, res as never);
}

if (state.newIndexCreation) {
if (state.skipDocumentMigration) {
// we created the index, so we can safely skip the whole document migration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT outdated comment

// this migrator is not configured to update documents (or its index has just been created), skip the whole document migration

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to move this condition inside the documentsUpdateInit()?

Otherwise, if in the future we have other paths that lead to the DOCUMENTS_UPDATE_INIT we'll have to add the if(state.skipDocumentMigration) condition in each of them. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reasoning was that INDEX_STATE_UPDATE_DONE was a mandatory step (as skipping the index update still leads to this stage), so we should be good keeping it there for now.

import { SavedObjectTypeRegistry } from '@kbn/core-saved-objects-base-server-internal';
import {
SavedObjectTypeRegistry,
SavedObjectsMigrationConfigType,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT can be imported as a type

@pgayvallet pgayvallet enabled auto-merge (squash) May 4, 2023 13:24
@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
@kbn/core-saved-objects-migration-server-internal 87 88 +1
Unknown metric groups

API count

id before after diff
@kbn/core-saved-objects-migration-server-internal 122 123 +1

ESLint disabled line counts

id before after diff
enterpriseSearch 19 21 +2
securitySolution 398 401 +3
total +5

Total ESLint disabled count

id before after diff
enterpriseSearch 20 22 +2
securitySolution 478 481 +3
total +5

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@pgayvallet pgayvallet merged commit 8f34b96 into elastic:main May 4, 2023
@kibanamachine kibanamachine added the backport:skip This commit does not require backporting label May 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting Feature:Migrations release_note:skip Skip the PR/issue when compiling release notes Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc v8.9.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants