-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SoMigV2] Fail fast if unknown document types are present in the source index #103341
[SoMigV2] Fail fast if unknown document types are present in the source index #103341
Conversation
…fail-unknown-docs
return { | ||
bool: { | ||
must: unusedTypesQuery, | ||
must_not: knownTypes.map((type) => ({ | ||
term: { | ||
type, | ||
}, | ||
})), | ||
}, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I included the unusedTypesQuery
in the unknown doc queries, are we're not migrating them to the temp index, and as it would allow to 'remove' old types by unregistering them while adding them to the unused query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it mean that users must remove them to continue the migration? If so, it might be considered as a breaking change even though we just fixed a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I mean the opposite actually: I'm excluding the objects matching the unusedTypesQuery, to avoid failing if any is encountered given than we don't migrate them anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, must
usage is a bit misleading here because unusedTypesQuery
contains another must_not
inside.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, the whole unusedTypesQuery
naming is misleading because it's actually an excludeUnusedTypesQuery
export const createInitialState = ({ | ||
kibanaVersion, | ||
targetMappings, | ||
preMigrationScript, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extracted this from model.ts
to its own file for maintainability.
Only addition is the new typeRegistry
parameter, and adding the knownTypes
to the default/initial state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is still relatively small so I guess it's ok to split files here too.
For larger efforts, it might make it harder for reviewers to figure out what has changed vs what code was moved around.
coreSetup.savedObjects.registerType({ | ||
name: 'foo', | ||
hidden: false, | ||
mappings: { | ||
properties: {}, | ||
}, | ||
namespaceType: 'agnostic', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test suite was triggering a failure by having some index-pattern
docs registered with the foo
type
{
type: 'foo',
id: 'index-pattern:some-id'
}
Adding this new unknown doc preflight check was causing the migration to fail too early, failing this test suite.
I'm now registering the foo
type in the suite, to let it fails at the exact same step it was failing before that PR.
coreSetup.savedObjects.registerType({ | ||
name: 'space', | ||
hidden: false, | ||
mappings: { | ||
properties: {}, | ||
}, | ||
namespaceType: 'single', | ||
migrations: { | ||
'7.14.0': (doc) => { | ||
doc.attributes.foo.bar = 12; | ||
return doc; | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same than for the cleanup
suite. This new preflight check was causing the suite to fail too early. I'm now registering the space
suite and registering a failing migration function instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should update test fixtures instead? Spaces
plugin SO are not relevant for this test anyway.
const { body: response } = await client.indices.getSettings({ index: '.kibana_7.13.0_001' }); | ||
const settings = response['.kibana_7.13.0_001'] | ||
.settings as estypes.IndicesIndexStatePrefixedSettings; | ||
expect(settings.index).not.toBeUndefined(); | ||
expect(settings.index!.blocks?.write).not.toEqual('true'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I manually modified the test locally to set the write block before these assertions to be sure that it was failing when the write block was present
).toMatchInlineSnapshot(` | ||
"Migration failed because documents from unknown types were found. To proceed with the migration, please delete these documents from the \\".kibana_15\\" index. | ||
The unknown documents were: | ||
- \\"unknownType:12\\" (type: \\"unknownType\\") | ||
- \\"anotherUnknownType:42\\" (type: \\"anotherUnknownType\\") | ||
You can delete them using the following command: | ||
curl -X POST \\"{elasticsearch}/.kibana_15/_bulk?pretty\\" -H 'Content-Type: application/json' -d' | ||
{ \\"delete\\" : { \\"_id\\" : \\"unknownType:12\\" } } | ||
{ \\"delete\\" : { \\"_id\\" : \\"anotherUnknownType:42\\" } } | ||
'" | ||
`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added curl
command to remove the documents as requested in #101351 (comment)
Still not sure this is really helping as
- we don't have the ES connection string in the migrator, so end user still have to manually replace it before executing the command
- if security is enabled, user will also still have to add the proper auth headers for the query to succeed.
- Do they have
curl
on windows ?
Wondering if we should really keep this @joshdover?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I think solving (1) may be worth the effort for clarity. I realize it would mean passing the config down several layers, but how feasible is this? Can be done as a follow up as well.
We should make this message a little more cautious though and recommend backing them up first (either with a snapshot or a curl bulk get)
export function extractTransformFailuresReason( | ||
corruptDocumentIds: string[], | ||
transformErrors: TransformErrorObjects[] | ||
): string { | ||
const corruptDocumentIdReason = | ||
corruptDocumentIds.length > 0 | ||
? ` ${ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extracted from model.ts
/** | ||
* A helper function/type for ensuring that all control state's are handled. | ||
*/ | ||
export function throwBadControlState(p: never): never; | ||
export function throwBadControlState(controlState: any) { | ||
throw new Error('Unexpected control state: ' + controlState); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the file was extracted from model.ts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment warning about doing too much in one PR but this is really great!
import * as Either from 'fp-ts/Either'; | ||
import { RetryableEsClientError } from '../actions'; | ||
|
||
export type ExcludeRetryableEsError<Response> = Exclude< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File extracted from model.ts
export const delayRetryState = <S extends State>( | ||
state: S, | ||
errorMessage: string, | ||
/** How many times to retry a step that fails */ | ||
maxRetryAttempts: number |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File extracted from model.ts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a test file for this helper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, there was no unit tests for the part of the code I extracted to this file. Can add some.
Pinging @elastic/kibana-core (Team:Core) |
…fail-unknown-docs
@@ -122,11 +140,11 @@ describe('migration v2', () => { | |||
} | |||
|
|||
expect(errorMessage.includes('7 transformation errors were encountered:')).toBeTruthy(); | |||
expect( | |||
errorMessage.includes( | |||
'space:default: Error: Document "default" has property "space" which belongs to a more recent version of Kibana [6.6.0]. The last known version is [undefined]' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, this case is no longer possible in v2 migrations now, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, The last known version is [undefined]
scenario is no longer possible during the migration as it would now fail before trying to transform the documents
license: 'basic', | ||
// dataset contains 2 type of unknown docs | ||
// `foo` documents | ||
// `space` documents (to mimic a migration with disabled plugins) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to make sure I understand why we don't need to set xpack.spaces.enabled: false
. Is this because createTestServers
only loads OSS plugins by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, it seems the test servers are only loading OSS by default (which made sense before the license changes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I had to double check that again against what's included under the default distribution. Maybe we should start referring to SSPL rather than OSS?
unknownDocs: CheckForUnknownDocsResponseDoc[]; | ||
} | ||
|
||
export const checkForUnknownDocs = ({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this file get an explicit unit test?
'space:default: Error: Document "default" has property "space" which belongs to a more recent version of Kibana [6.6.0]. The last known version is [undefined]' | ||
expect(errorMessage).toEqual( | ||
expect.stringContaining( | ||
'space:sixth: Error: Migration function for version 7.14.0 threw an error' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know we're only asserting agains the error message here but it would be great if we could also assert that the original stack trace from the failed transform is available too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're already doing that in a dedicated test suite (/type_migration_failure.test.ts
), and such assertions are quite tedious, which is why I thought it wasn't necessary to do it here too.
kibana/src/core/server/saved_objects/migrationsv2/integration_tests/type_migration_failure.test.ts
Lines 133 to 136 in 0561648
{ | |
mode: 'equal', | |
value: 'Error: "number" attribute should be present', | |
}, |
expect( | ||
errorMessage.startsWith( | ||
'Unable to complete saved object migrations for the [.kibana] index: Migration failed because documents ' + | ||
'from unknown types were found. To proceed with the migration, please delete these documents from the ' + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of context of saved objects, if a user sees this message, they might not know what the 'type' is here. Maybe we can improve the message string to make it clear what the 'type' refers to?
sourceIndex: string | ||
): string { | ||
return ( | ||
`Migration failed because documents from unknown types were found. ` + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can be clearer with the failure message here. e.g.
`Migration failed because documents from unknown types were found. ` + | |
`Migration failed because documents were found for unknown saved object types. ` + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to be really verbose, we could hint at the cause too:
The plugin or feature that was used to create these documents is no longer available.
That might be taking it a bit far though. We could add a possible reason for these unknown
type errors to the documentation instead but I'm concerned about users who might already too frustrated to 'go and read the docs'? WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plugin or feature that was used to create these documents is no longer available.
We can't really know for sure why and how the type is no longer registered, which is why I think we should avoid taking such guesses in the outputted logs?
e.g
- The registering plugin could just be disabled
- The document could be corrupted and just not have any
type
at all - ???
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could probably be summarized as: the plugin that created this document is either missing or disabled.
Any other case is likely a bug (either in our plugin code or a 3rd party plugin). We need to have an explicit API for registering types that can be safely deleted.
src/core/server/saved_objects/migrationsv2/model/extract_errors.ts
Outdated
Show resolved
Hide resolved
}; | ||
} else { | ||
return throwBadResponse(stateP, res); | ||
} | ||
} else if (stateP.controlState === 'CHECK_UNKNOWN_DOCUMENTS') { | ||
const res = resW as ExcludeRetryableEsError<ResponseType<typeof stateP.controlState>>; | ||
if (Either.isRight(res)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm torn here between returning a Either.Right
or Either.Left
for returning something that will ultimately cause the migrations to fail. Generally, the 'happy path' for migrations is indicated by an Either.Right
res but, in this case, our success result is when we have found documents with unknown types.
I guess it's not critical to the implementation, it just feels inconsistent with what I'd have expected, especially since we're transitioning directly to the FATAL
state from here if docs are found. See for example how we handle the 'ultimately will fail migrations' response for the OUTDATED_DOCUMENTS_TRANSFORM
step
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, you're right, using left
when unknown doc types are found makes sense. Will adapt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall implementation looks ok. I've made some suggestions and recommendations for improvement. If needed, those can be handled in a follow up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
src/core/server/saved_objects/migrationsv2/actions/check_for_unknown_docs.ts
Outdated
Show resolved
Hide resolved
coreSetup.savedObjects.registerType({ | ||
name: 'space', | ||
hidden: false, | ||
mappings: { | ||
properties: {}, | ||
}, | ||
namespaceType: 'single', | ||
migrations: { | ||
'7.14.0': (doc) => { | ||
doc.attributes.foo.bar = 12; | ||
return doc; | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should update test fixtures instead? Spaces
plugin SO are not relevant for this test anyway.
export const delayRetryState = <S extends State>( | ||
state: S, | ||
errorMessage: string, | ||
/** How many times to retry a step that fails */ | ||
maxRetryAttempts: number |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a test file for this helper?
return { | ||
bool: { | ||
must: unusedTypesQuery, | ||
must_not: knownTypes.map((type) => ({ | ||
term: { | ||
type, | ||
}, | ||
})), | ||
}, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it mean that users must remove them to continue the migration? If so, it might be considered as a breaking change even though we just fixed a bug.
}; | ||
} else { | ||
return throwBadResponse(stateP, res.left); | ||
} | ||
} | ||
} else if (stateP.controlState === 'SET_SOURCE_WRITE_BLOCK') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you going to clean up the model from the unnecessary steps and update RFC in a follow-up PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you going to clean up the model from the unnecessary steps
The PR did not 'deprecate' any step?
and update the RFC
Forgot about that, will do in current PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR did not 'deprecate' any step?
Do we need to keep OUTDATED_DOCUMENTS_*
steps with the logic introduces in the current PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we do, but still not 100% sure tbh. Also, this PR did not really change anything functional in the v2 algorithm, it just fails earlier.
Will open a specific issue to discuss about the OUTDATED_DOCUMENTS_*
steps removal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created #103731
💚 Build SucceededMetrics [docs]
History
To update your PR or re-run it, just comment with: |
…ce index (elastic#103341) * initial draft * fix some tests * fix additional unit tests * move all the things * create error generation fn * add correct error message * add unknown types to log message * fix types * fix existing test suites * add IT test * review comments * add tests + use unknown instead of undefined for empty types * update RFC with new step # Conflicts: # rfcs/text/0013_saved_object_migrations.md
…ce index (#103341) (#103733) * initial draft * fix some tests * fix additional unit tests * move all the things * create error generation fn * add correct error message * add unknown types to log message * fix types * fix existing test suites * add IT test * review comments * add tests + use unknown instead of undefined for empty types * update RFC with new step # Conflicts: # rfcs/text/0013_saved_object_migrations.md
Summary
Fix #101351
CHECK_UNKNOWN_DOCUMENTS
stage betweenWAIT_FOR_YELLOW_SOURCE
andSET_SOURCE_WRITE_BLOCK
to check for unknown documents (and fail if encountered) before starting the migration or enable the write block on the source indexmodel.ts
monolith into multiple files in its own module for maintainability.Checklist