-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bootstrap ZDT migration algorithm #151282
Bootstrap ZDT migration algorithm #151282
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Self-review, trying to explain my reasoning:
algorithm: schema.oneOf([schema.literal('v2'), schema.literal('zdt')], { | ||
defaultValue: 'v2', | ||
}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure how to name it, so I went with zdt
. If anyone prefers managed
, or anything else, please feel free to tell me. We could even change the algorithm
name to something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with a more descriptive algorithm name, managed
is a little too obscure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From someone using this API's perspective the values v2
and zdt
seem to not have any relation. This is OK, but we need a good doc comment about what each means and where it came from.
IMO zero-downtime
or operator
(bc I'm guessing this algo is intended to work with the "operator pattern") are also good candidates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my 2cents: I'd use a name that makes it obvious that there is an external agent required (operator
seems like a good candidate).
At the same time, I tend to prefer options that resemble the functionality they enable: Zero-downtime...
Mixing both thoughts... I wonder if we should be more explicit about when we plan to use it (or anyone willing to change the defaults is expected to use this): with the k8s operator? Something like k8s-zdt
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, naming is hard.... we could also go with serverless
and default
/standard
. OTOH at some point we may want to allow this algo to be officially supported on prem (with 'manual' orchestration of the workflow)... So I really don't know.
export const logStateTransition = ( | ||
logger: Logger, | ||
logMessagePrefix: string, | ||
prevState: LogAwareState, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was extracted from being inlined in a state machine file of the v2 algorithm to be re-used for zdt.
Please note the src/common
folder of the package. This is where I'm planning to move stuff shared/common between the two algo.
Basically the ideal end structure for me would be
- src
- common
- actions
- other_stuff
- v2
- model
- other_v2_specific_stuff
- zdt
- folders_this_pr_introduced
Does that seems fine to you, or should we go with another folder structure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that seems fine to you, or should we go with another folder structure?
Seems good to me. It's not worth getting too tied up in where the code lives, as long as it's somewhat categorized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm all good with src/common
... just wondering about the differences between src/common
and src/core
... Should we pick one or are we good with having both?
If we are at risk of having circular references, I'm OK with having 2 common dirs 😇
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/core
is the remain of old history tbh 😅 . Ideally we would be moving src/core
to subfolders of src/common
soon. I just wanted to avoid doing it in the initial PR.
/** | ||
* Merge mappings from all registered saved object types. | ||
*/ | ||
export const buildTypesMappings = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extracted from v2. Moved to src/core
because the other mapping-related helpers are there already. src/core
may eventually move to src/common/core
or something later (if we feel like it's worth the effort)
if (migrationAlgorithm === 'zdt') { | ||
return this.runMigrationZdt(); | ||
} else { | ||
return this.runMigrationV2(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The branching
|
||
export async function cleanup(client: ElasticsearchClient, state?: State) { | ||
if (!state) return; | ||
type CleanableState = { sourceIndexPitId: string } | {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adapted the type to make it reusable with different State
types (as the 2 algos have different shapes of states)
import type { State } from '../../state'; | ||
import type { ModelStage } from '../types'; | ||
|
||
export const init: ModelStage<'INIT'> = (state, res, context): State => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allowing to define them just like that (parameter types are all properly inferred)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separating these into individual functions is really great! What do you think of something like this as an iteration on this idea:
// just an example
export const init: ModelTransition<'INIT', 'FATAL' | 'DONE'> = (state, res, context) => {}
In this way we will:
- leverage TS sanity check on our expected return values
- replace
State
return value declarations from these functions
ModelTransition
would be defined as:
/**
* Defines a transition function for the model
*/
export type ModelTransition<T extends AllActionStates, R extends AllControlStates> = (
state: StateFromActionState<T>,
res: StateActionResponse<T>,
context: MigratorContext
) => StateFromControlState<T | R>; // T | R because we assume a state can go back to itself as QOL for user of this type...
Plus utility:
export type StateFromControlState<T extends AllControlStates> = ControlStateMap[T];
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your improvements are great. Done in f7d6c7f
// nothing implemented yet, just going to 'DONE' | ||
return { | ||
...state, | ||
controlState: 'DONE', | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, not the purpose of this PR, implementation will be done in follow-up(s)
describe('Action: init', () => { | ||
let context: MockedMigratorContext; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per model stage test file
export const next = (context: MigratorContext) => { | ||
const map = nextActionMap(context); | ||
|
||
return (state: State) => { | ||
const delay = <F extends (...args: any) => any>(fn: F): (() => ReturnType<F>) => { | ||
return () => { | ||
return state.retryDelay > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also encountered an issue trying to factorize this. nextActionMap
cannot be factorized, and next
depends on it (and most importantly, on the shape of the State
).
We could try spending some time to find a proper TS way to make this generic, but for 50 lines, it didn't felt like a priority.
export interface BaseState extends ControlState { | ||
readonly retryCount: number; | ||
readonly retryDelay: number; | ||
readonly logs: MigrationLog[]; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
State types are different between the two algorithm (both the BaseState
and the composition of the State
) so I don't think we can really re-use anything in this specific part of the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not totally onboard with having logs
as part of the state
.
Stages won't require checking logs for anything, so it feels weird to have them as part of the state.
That being said, alternatives would probably require having stages as classes or objects that implement some sort of interface, so perhaps that's not ideal either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. TBH I tried just having the logger in the context initially, and it brings two issues:
- incompatibilities between helper functions of the two algos
- making it harder to control when logs are outputted
So I went back to putting the logs in the state...
So if we were to change that, I think we would need to do it in both implementations at the same time, which kinda closed the question for me (given it's not the same amount of work at all)
Pinging @elastic/kibana-core (Team:Core) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a few comments and nits, none of which block merging.
Exciting times!
LGTM
} | ||
|
||
logger.info( | ||
logMessagePrefix + `${prevState.controlState} -> ${currState.controlState}. took: ${tookMs}ms.` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't see it here but do we add a migration log identifier? I mean, does the logMessagePrefix
include some indication that the logs come from the zdt
migration algorithm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, atm we're using the same prefix than the v2 algo:
Line 47 in 445058b
const logMessagePrefix = `[${context.indexPrefix}] `; |
We should probably allow to distinguish between the two, but maybe we should use a dedicated logger child/context instead of this prefix, to make sure all our logs are properly identifiable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dedicated logger child/context instead of this prefix, to make sure all our logs are properly identifiable
++
}); | ||
} | ||
|
||
private runMigrationV2(): Promise<MigrationResult[]> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
asside: The prep needed for runResilientMigrator
looks clunky compared to runZeroDowntimeMigration
.
Would it make sense to move that to a 'prepareResilientMigration' setup function? The cleanup's not needed now but possibly some time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree. Ideally most of the 'architectural'/'code' improvements we do in this second algo should be ported to the existing one. Not sure our current priorities will allow us to take such chore tasks in the following months.
*/ | ||
|
||
export const StageMocks = { | ||
init: jest.fn().mockImplementation((state: unknown) => state), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be fine and it even gives us testing options to mock stages individually.
context = createContextMock(); | ||
}); | ||
|
||
test('INIT -> DONE because its not done yet', () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
test('INIT -> DONE because its not done yet', () => { | |
test("INIT -> DONE because it's not implemented yet", () => { |
"not done yet" could mean that the migration hasn't finished yet, making for a confused dev reading the code 😉
...saved-objects/core-saved-objects-migration-server-internal/src/zdt/model/stages/init.test.ts
Outdated
Show resolved
Hide resolved
test('INIT -> INIT when cluster routing allocation is incompatible', () => { | ||
const state = createState(); | ||
const res: StateActionResponse<'INIT'> = Either.left({ | ||
type: 'incompatible_cluster_routing_allocation', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we going to enforce routing allocation given that these will be managed instances?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great question, and I'm not sure tbh. I mostly re-used the existing init
action as a POC that the algorithm got executed. The zdt workflow may start by something different, and we may not need to check for such things on managed environment, yeah.
export type StateActionResponse<T extends AllActionStates> = ExcludeRetryableEsError< | ||
ResponseType<T> | ||
>; | ||
|
||
/** | ||
* Defines a stage delegation function for the model | ||
*/ | ||
export type ModelStage<T extends AllActionStates> = ( | ||
state: StateFromActionState<T>, | ||
res: StateActionResponse<T>, | ||
context: MigratorContext | ||
) => State; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🪄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some initial comments, will finish rest of review soon! Let me know what you think!
algorithm: schema.oneOf([schema.literal('v2'), schema.literal('zdt')], { | ||
defaultValue: 'v2', | ||
}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From someone using this API's perspective the values v2
and zdt
seem to not have any relation. This is OK, but we need a good doc comment about what each means and where it came from.
IMO zero-downtime
or operator
(bc I'm guessing this algo is intended to work with the "operator pattern") are also good candidates.
} | ||
|
||
logger.info( | ||
logMessagePrefix + `${prevState.controlState} -> ${currState.controlState}. took: ${tookMs}ms.` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dedicated logger child/context instead of this prefix, to make sure all our logs are properly identifiable
++
import type { State } from '../../state'; | ||
import type { ModelStage } from '../types'; | ||
|
||
export const init: ModelStage<'INIT'> = (state, res, context): State => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separating these into individual functions is really great! What do you think of something like this as an iteration on this idea:
// just an example
export const init: ModelTransition<'INIT', 'FATAL' | 'DONE'> = (state, res, context) => {}
In this way we will:
- leverage TS sanity check on our expected return values
- replace
State
return value declarations from these functions
ModelTransition
would be defined as:
/**
* Defines a transition function for the model
*/
export type ModelTransition<T extends AllActionStates, R extends AllControlStates> = (
state: StateFromActionState<T>,
res: StateActionResponse<T>,
context: MigratorContext
) => StateFromControlState<T | R>; // T | R because we assume a state can go back to itself as QOL for user of this type...
Plus utility:
export type StateFromControlState<T extends AllControlStates> = ControlStateMap[T];
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So exciting to have this 🎉
algorithm: schema.oneOf([schema.literal('v2'), schema.literal('zdt')], { | ||
defaultValue: 'v2', | ||
}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my 2cents: I'd use a name that makes it obvious that there is an external agent required (operator
seems like a good candidate).
At the same time, I tend to prefer options that resemble the functionality they enable: Zero-downtime...
Mixing both thoughts... I wonder if we should be more explicit about when we plan to use it (or anyone willing to change the defaults is expected to use this): with the k8s operator? Something like k8s-zdt
?
export const logStateTransition = ( | ||
logger: Logger, | ||
logMessagePrefix: string, | ||
prevState: LogAwareState, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm all good with src/common
... just wondering about the differences between src/common
and src/core
... Should we pick one or are we good with having both?
If we are at risk of having circular references, I'm OK with having 2 common dirs 😇
export function throwBadResponse(state: { controlState: string }, p: never): never; | ||
export function throwBadResponse(state: { controlState: string }, res: any): never { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: does res: unknown
work here?
*/ | ||
|
||
export const StageMocks = { | ||
init: jest.fn().mockImplementation((state: unknown) => state), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably something like jest.requireActual('./stages')
and loop over its keys would automate the creation of the mocks... although I don't think it's a big deal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for integrating my feedback @pgayvallet , overall LGTM!
💚 Build Succeeded
Metrics [docs]Unknown metric groupsESLint disabled line counts
Total ESLint disabled count
History
To update your PR or re-run it, just comment with: |
## Summary Part of elastic#150309 Purpose of the PR is to create the skeleton of the ZDT algorithm, in order to make sure we're all aligned on the way we'll be managing our codebase between the 2 implementation (and to ease with the review of the follow-up PRs by not having the bootstrap of the algo to review at the same time) --------- Co-authored-by: kibanamachine <[email protected]>
Summary
Part of #150309
Purpose of the PR is to create the skeleton of the ZDT algorithm, in order to make sure we're all aligned on the way we'll be managing our codebase between the 2 implementation (and to ease with the review of the follow-up PRs by not having the bootstrap of the algo to review at the same time)