Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SoMigV2] Fail fast if unknown document types are present in the source index #103341

Merged

Conversation

pgayvallet
Copy link
Contributor

@pgayvallet pgayvallet commented Jun 24, 2021

Summary

Fix #101351

  • Add a new CHECK_UNKNOWN_DOCUMENTS stage between WAIT_FOR_YELLOW_SOURCE and SET_SOURCE_WRITE_BLOCK to check for unknown documents (and fail if encountered) before starting the migration or enable the write block on the source index
  • Split the model.ts monolith into multiple files in its own module for maintainability.

Checklist

@pgayvallet pgayvallet added project:ResilientSavedObjectMigrations Reduce Kibana upgrade failures by making saved object migrations more resilient release_note:skip Skip the PR/issue when compiling release notes Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc v7.15.0 labels Jun 25, 2021
Comment on lines +69 to +78
return {
bool: {
must: unusedTypesQuery,
must_not: knownTypes.map((type) => ({
term: {
type,
},
})),
},
};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I included the unusedTypesQuery in the unknown doc queries, are we're not migrating them to the temp index, and as it would allow to 'remove' old types by unregistering them while adding them to the unused query.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean that users must remove them to continue the migration? If so, it might be considered as a breaking change even though we just fixed a bug.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I mean the opposite actually: I'm excluding the objects matching the unusedTypesQuery, to avoid failing if any is encountered given than we don't migrate them anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, must usage is a bit misleading here because unusedTypesQuery contains another must_not inside.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, the whole unusedTypesQuery naming is misleading because it's actually an excludeUnusedTypesQuery

Comment on lines +20 to +23
export const createInitialState = ({
kibanaVersion,
targetMappings,
preMigrationScript,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracted this from model.ts to its own file for maintainability.

Only addition is the new typeRegistry parameter, and adding the knownTypes to the default/initial state.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is still relatively small so I guess it's ok to split files here too.
For larger efforts, it might make it harder for reviewers to figure out what has changed vs what code was moved around.

Comment on lines +103 to +109
coreSetup.savedObjects.registerType({
name: 'foo',
hidden: false,
mappings: {
properties: {},
},
namespaceType: 'agnostic',
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test suite was triggering a failure by having some index-pattern docs registered with the foo type

{
   type: 'foo',
   id: 'index-pattern:some-id'
}

Adding this new unknown doc preflight check was causing the migration to fail too early, failing this test suite.

I'm now registering the foo type in the suite, to let it fails at the exact same step it was failing before that PR.

Comment on lines +98 to +109
coreSetup.savedObjects.registerType({
name: 'space',
hidden: false,
mappings: {
properties: {},
},
namespaceType: 'single',
migrations: {
'7.14.0': (doc) => {
doc.attributes.foo.bar = 12;
return doc;
},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same than for the cleanup suite. This new preflight check was causing the suite to fail too early. I'm now registering the space suite and registering a failing migration function instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should update test fixtures instead? Spaces plugin SO are not relevant for this test anyway.

Comment on lines +94 to +98
const { body: response } = await client.indices.getSettings({ index: '.kibana_7.13.0_001' });
const settings = response['.kibana_7.13.0_001']
.settings as estypes.IndicesIndexStatePrefixedSettings;
expect(settings.index).not.toBeUndefined();
expect(settings.index!.blocks?.write).not.toEqual('true');
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I manually modified the test locally to set the write block before these assertions to be sure that it was failing when the write block was present

Comment on lines 27 to 37
).toMatchInlineSnapshot(`
"Migration failed because documents from unknown types were found. To proceed with the migration, please delete these documents from the \\".kibana_15\\" index.
The unknown documents were:
- \\"unknownType:12\\" (type: \\"unknownType\\")
- \\"anotherUnknownType:42\\" (type: \\"anotherUnknownType\\")
You can delete them using the following command:
curl -X POST \\"{elasticsearch}/.kibana_15/_bulk?pretty\\" -H 'Content-Type: application/json' -d'
{ \\"delete\\" : { \\"_id\\" : \\"unknownType:12\\" } }
{ \\"delete\\" : { \\"_id\\" : \\"anotherUnknownType:42\\" } }
'"
`);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added curl command to remove the documents as requested in #101351 (comment)

Still not sure this is really helping as

  1. we don't have the ES connection string in the migrator, so end user still have to manually replace it before executing the command
  2. if security is enabled, user will also still have to add the proper auth headers for the query to succeed.
  3. Do they have curl on windows :trollface: ?

Wondering if we should really keep this @joshdover?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I think solving (1) may be worth the effort for clarity. I realize it would mean passing the config down several layers, but how feasible is this? Can be done as a follow up as well.

We should make this message a little more cautious though and recommend backing them up first (either with a snapshot or a curl bulk get)

Comment on lines +15 to +21
export function extractTransformFailuresReason(
corruptDocumentIds: string[],
transformErrors: TransformErrorObjects[]
): string {
const corruptDocumentIdReason =
corruptDocumentIds.length > 0
? ` ${
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracted from model.ts

Comment on lines +14 to +19
/**
* A helper function/type for ensuring that all control state's are handled.
*/
export function throwBadControlState(p: never): never;
export function throwBadControlState(controlState: any) {
throw new Error('Unexpected control state: ' + controlState);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the file was extracted from model.ts

Copy link
Contributor

@TinaHeiligers TinaHeiligers Jun 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment warning about doing too much in one PR but this is really great!

Comment on lines +9 to +12
import * as Either from 'fp-ts/Either';
import { RetryableEsClientError } from '../actions';

export type ExcludeRetryableEsError<Response> = Exclude<
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File extracted from model.ts

Comment on lines +11 to +15
export const delayRetryState = <S extends State>(
state: S,
errorMessage: string,
/** How many times to retry a step that fails */
maxRetryAttempts: number
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File extracted from model.ts

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a test file for this helper?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there was no unit tests for the part of the code I extracted to this file. Can add some.

@pgayvallet pgayvallet marked this pull request as ready for review June 28, 2021 06:53
@pgayvallet pgayvallet requested a review from a team as a code owner June 28, 2021 06:53
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

@joshdover joshdover self-requested a review June 28, 2021 10:29
@@ -122,11 +140,11 @@ describe('migration v2', () => {
}

expect(errorMessage.includes('7 transformation errors were encountered:')).toBeTruthy();
expect(
errorMessage.includes(
'space:default: Error: Document "default" has property "space" which belongs to a more recent version of Kibana [6.6.0]. The last known version is [undefined]'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, this case is no longer possible in v2 migrations now, correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, The last known version is [undefined] scenario is no longer possible during the migration as it would now fail before trying to transform the documents

license: 'basic',
// dataset contains 2 type of unknown docs
// `foo` documents
// `space` documents (to mimic a migration with disabled plugins)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to make sure I understand why we don't need to set xpack.spaces.enabled: false. Is this because createTestServers only loads OSS plugins by default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, it seems the test servers are only loading OSS by default (which made sense before the license changes)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I had to double check that again against what's included under the default distribution. Maybe we should start referring to SSPL rather than OSS?

unknownDocs: CheckForUnknownDocsResponseDoc[];
}

export const checkForUnknownDocs = ({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this file get an explicit unit test?

'space:default: Error: Document "default" has property "space" which belongs to a more recent version of Kibana [6.6.0]. The last known version is [undefined]'
expect(errorMessage).toEqual(
expect.stringContaining(
'space:sixth: Error: Migration function for version 7.14.0 threw an error'
Copy link
Contributor

@TinaHeiligers TinaHeiligers Jun 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we're only asserting agains the error message here but it would be great if we could also assert that the original stack trace from the failed transform is available too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're already doing that in a dedicated test suite (/type_migration_failure.test.ts), and such assertions are quite tedious, which is why I thought it wasn't necessary to do it here too.

{
mode: 'equal',
value: 'Error: "number" attribute should be present',
},

expect(
errorMessage.startsWith(
'Unable to complete saved object migrations for the [.kibana] index: Migration failed because documents ' +
'from unknown types were found. To proceed with the migration, please delete these documents from the ' +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of context of saved objects, if a user sees this message, they might not know what the 'type' is here. Maybe we can improve the message string to make it clear what the 'type' refers to?

sourceIndex: string
): string {
return (
`Migration failed because documents from unknown types were found. ` +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can be clearer with the failure message here. e.g.

Suggested change
`Migration failed because documents from unknown types were found. ` +
`Migration failed because documents were found for unknown saved object types. ` +

Copy link
Contributor

@TinaHeiligers TinaHeiligers Jun 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to be really verbose, we could hint at the cause too:

The plugin or feature that was used to create these documents is no longer available.

That might be taking it a bit far though. We could add a possible reason for these unknown type errors to the documentation instead but I'm concerned about users who might already too frustrated to 'go and read the docs'? WDYT?

Copy link
Contributor Author

@pgayvallet pgayvallet Jun 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plugin or feature that was used to create these documents is no longer available.

We can't really know for sure why and how the type is no longer registered, which is why I think we should avoid taking such guesses in the outputted logs?

e.g

  1. The registering plugin could just be disabled
  2. The document could be corrupted and just not have any type at all
  3. ???

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could probably be summarized as: the plugin that created this document is either missing or disabled. Any other case is likely a bug (either in our plugin code or a 3rd party plugin). We need to have an explicit API for registering types that can be safely deleted.

};
} else {
return throwBadResponse(stateP, res);
}
} else if (stateP.controlState === 'CHECK_UNKNOWN_DOCUMENTS') {
const res = resW as ExcludeRetryableEsError<ResponseType<typeof stateP.controlState>>;
if (Either.isRight(res)) {
Copy link
Contributor

@TinaHeiligers TinaHeiligers Jun 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm torn here between returning a Either.Right or Either.Left for returning something that will ultimately cause the migrations to fail. Generally, the 'happy path' for migrations is indicated by an Either.Right res but, in this case, our success result is when we have found documents with unknown types.

I guess it's not critical to the implementation, it just feels inconsistent with what I'd have expected, especially since we're transitioning directly to the FATAL state from here if docs are found. See for example how we handle the 'ultimately will fail migrations' response for the OUTDATED_DOCUMENTS_TRANSFORM step

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, you're right, using left when unknown doc types are found makes sense. Will adapt.

Copy link
Contributor

@TinaHeiligers TinaHeiligers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall implementation looks ok. I've made some suggestions and recommendations for improvement. If needed, those can be handled in a follow up.

@pgayvallet pgayvallet requested a review from joshdover June 29, 2021 10:28
Copy link
Contributor

@joshdover joshdover left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@TinaHeiligers TinaHeiligers self-requested a review June 29, 2021 13:49
Copy link
Contributor

@TinaHeiligers TinaHeiligers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Comment on lines +98 to +109
coreSetup.savedObjects.registerType({
name: 'space',
hidden: false,
mappings: {
properties: {},
},
namespaceType: 'single',
migrations: {
'7.14.0': (doc) => {
doc.attributes.foo.bar = 12;
return doc;
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should update test fixtures instead? Spaces plugin SO are not relevant for this test anyway.

Comment on lines +11 to +15
export const delayRetryState = <S extends State>(
state: S,
errorMessage: string,
/** How many times to retry a step that fails */
maxRetryAttempts: number
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a test file for this helper?

Comment on lines +69 to +78
return {
bool: {
must: unusedTypesQuery,
must_not: knownTypes.map((type) => ({
term: {
type,
},
})),
},
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean that users must remove them to continue the migration? If so, it might be considered as a breaking change even though we just fixed a bug.

};
} else {
return throwBadResponse(stateP, res.left);
}
}
} else if (stateP.controlState === 'SET_SOURCE_WRITE_BLOCK') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you going to clean up the model from the unnecessary steps and update RFC in a follow-up PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you going to clean up the model from the unnecessary steps

The PR did not 'deprecate' any step?

and update the RFC

Forgot about that, will do in current PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR did not 'deprecate' any step?

Do we need to keep OUTDATED_DOCUMENTS_* steps with the logic introduces in the current PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we do, but still not 100% sure tbh. Also, this PR did not really change anything functional in the v2 algorithm, it just fails earlier.

Will open a specific issue to discuss about the OUTDATED_DOCUMENTS_* steps removal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #103731

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@pgayvallet pgayvallet merged commit 5d24b23 into elastic:master Jun 29, 2021
pgayvallet added a commit to pgayvallet/kibana that referenced this pull request Jun 29, 2021
…ce index (elastic#103341)

* initial draft

* fix some tests

* fix additional unit tests

* move all the things

* create error generation fn

* add correct error message

* add unknown types to log message

* fix types

* fix existing test suites

* add IT test

* review comments

* add tests + use unknown instead of undefined for empty types

* update RFC with new step
# Conflicts:
#	rfcs/text/0013_saved_object_migrations.md
pgayvallet added a commit that referenced this pull request Jun 29, 2021
…ce index (#103341) (#103733)

* initial draft

* fix some tests

* fix additional unit tests

* move all the things

* create error generation fn

* add correct error message

* add unknown types to log message

* fix types

* fix existing test suites

* add IT test

* review comments

* add tests + use unknown instead of undefined for empty types

* update RFC with new step
# Conflicts:
#	rfcs/text/0013_saved_object_migrations.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
project:ResilientSavedObjectMigrations Reduce Kibana upgrade failures by making saved object migrations more resilient release_note:skip Skip the PR/issue when compiling release notes Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc v7.14.0 v7.15.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SO migration v2: fail the migration early if unknown types are encountered
6 participants