-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement observe
, regroup
and compute
as distinct execution phases
#809
Comments
observe
, group
and compute
as distinct execution phases
observe
, group
and compute
as distinct execution phasesobserve
, regroup
and compute
as distinct execution phases
@jawache please review the AC to ensure you're in alignment |
@jmcook1186 re: if-run should include some validation logic that determines which phases to run. Specifically, if input data is already provided int he manifest file provided to if-run then the observe phase should be skipped. I think it's fine (and expected) to run the observe step even if it has inputs, the intention would be to gather observations again and run with new observations. |
@jmcook1186 first AC with observe, the inputs should be populated not outputs |
"AND if the same command is run with if-run --observe -m manifest.yml --suppress-output then no data is displayed to the console and no data is saved to file." - I'm unsure what the rationale is for surpess-output? |
@jmcook1186 all good apart from my comments above, very thorough! Only additional is that we should be able to run say regroup and compute in one step. So if-run -regroup -compute should be able to run both steps in one go. In fact perhaps the right flag might be -skip-observe (-skip-regoup etc...) i think the most common use case here is just to skip the observe step and run the other two. |
Oh and one other point, regroup should be able to run on a manifest file with previously grouped inputs. It should first flatten all inputs into a 1d array and regroup from that baseline |
Thanks @jawache - agree on all points, updating ticket now. The rationale for |
@narekhovhannisyan please deep-dive to confirm solution is clear and let me know once you have, I will proceed to split it into 5 subtickets:
|
Waiting to get all the manisfests in the same style to fully test the phased execution #812 |
Why: Sub of #762
What: Refactor IF into three distinct execution phases:
observe
re-group
compute
Each phase can be executed individually using a command (
if-compute
,if-observe
,if-regroup
). When run as individual commands, they accept yaml files or yaml data on stdin to operate over. They can save their outputs to yaml files or dump to console via stdout.The three phases can be run end-to-end using a single command,
if-run
.However, each phase can be run individually, by passing the individual phase flags on the command line, e.g.
Note that
observe
andregroup
MODIFY INPUTS only.compute
does not modify inputs, it CREATES OUTPUTS only.Changes to manifests
The execution pipeline should also be separated into three phases inside the manifest so that what used to be a single contiguous pipeline now looks something like:
A manifest that has not yet passed through the
observe
phase will havenull
inputs:--observe
if-run --observe
covers the import/data creation phase of the IF execution. It should take a manifest with no input data and return a manifest with a populatedinputs
array. This could happen by executing an importer plugin or mock observations.The observe phase needs to receive a manifest file with the importer(s) configured. It will usually receive a manifest with the
inputs
data missing or empty.As -
-observe
is runnable in isolation, it should apture the state of the manifest after the plugins int heobserve
pipeline have completed and save it to the yaml filepath specified to the--output
/-o
command. It can only output yaml data. If no output command is provided, it should display the output data to the console. However, we should also support a--suppress-output
command that prevents the data being displayed to the console.The importers will be initialized in precisely the same way as they are today (no separation into phases required in the
initialize
block) but the plugins to execute in theobserve
phase are those defined in thetree: children: child: pipeline: observe:
block in the manifest.The following command should enrich a manifest with no inputs and save a copy with input data:
if-run --observe -m manifest.yml -o manifest-with-data.yml
--regroup
if-run --regroup
should apply the logic from the existinggroup-by
plugin but apply it as an IF core feature rather than a plugin. Thegroupby
plugin should be deleted as part of this task. Theregroup
feature will get its arguments from thetree: children: child: pipeline:regroup:
config in the manifest. This block should include an array of keys to regroup by, for example:regroup
will see thecloud/instance-type
value in the manifest and regroup the input values using thecloud/instance-type
values found in theinputs
array, equivalent to this in today'sgroup-by
plugin:regroup
needs to operate on yaml data whereinput
data exists. This can come from a file where the input data was pre-populated byif-run --observe
some time in the past, or can come from a freshif-run --observe
, i.e. all the following should work:if-run --regroup -m manifest-with-input-data.yml -o regrouped-manifest.yml
if-run --observe -m manifest-with-no-data.yml | if-run --regroup -o regrouped-manifest.yml
if-run --observe -m manifest-with-no-data.yml -o manifest-with-input-data.yml && if-run -m manifest-with-input-data.yml --regroup -o regrouped-manifest.yml
--compute
if-run --compute
should execute all the plugins in thecompute
section of thetree: children:child: pipeline: compute
block in a manifest. For example, given the following information:--compute
will execute thesum
,multiply
,divide
andinterpolate
plugins over theinput
data in sequence according to the config provided in theinitialize
block.--compute
needs to accept yaml data that has a populatedinputs
array. It will create anoutputs
array.Both of the following commands should worK
if-run --compute -m regrouped-manifest.yml -o output.yml
if-run --regroup ungrouped-manifest | if-run --compute
if-run
Simply using
if-run
with none of the execution phase flags should run an entire execution pipeline that includes theobserve
,regroup
andcompute
phases.if-run
should be able to optionally save the final output to file or display in the console using the--output
/o
command like today. If so, it is the final output from--compute
that gets outputted. However, it should also be possible to save the intermediate representations (i.e. the outputs from--observe
and--regreoup
by specifying this on the command line.if-run -m manifest.yml -o output.yml --save-intermediates outputs
will save the outputs from
--observe
and--regroup
to theoutputs
folder with the filenamesmanifest-observe.yml
andmanifest-regroup.yml
. The output folder name is provided to--save-intermediates
and the filenames are simply the given manifest name with the-observe
or-regroup
suffix appended.You can also do
if-run -o output.yml --display-intermediates
to display the results from
if-observe
andif-regroup
in the console but not save them to file.if-run
should include some validation logic that determines which phases to run. If there is noregroup
section or theregroup
section of the manifest is empty, then theregroup
phase should be skipped.compute
should always be run.manifest validation
Each execution phase now has its own specific requirements for the content of a manifest, so they should do their own independent validation steps. The current validation of the
context
is common to all three phases, butobserve
requires there to be noinputs
whereasregroup
andcompute
requireinputs
to be present and correctly formatted.Scope of work:
if-run
--observe
runs from the command line #893if-run
--regroup
runs from the command line #894if-run
--compute
runs from the command line. #895if-run
with none or a combinations of phase arguments #896Acceptance criteria
Scenario 1:
if-run --observe
can be run on the command lineif-run --observe
can be run on the command line--observe
is a flag that modifies the behaviour ofif-run
so that it only executes theobserve
phase. It expects to operate on a manifest without input data and can either return its result to the console or save to file (this is just the normalif-run
behaviour). The result will always be yaml data. The command enriches theinputs
array but never creates outputs. The command executes plugins initialized in theinitialize
section of the manifest and listed in thetree: pipeline: observe
section of each node in the tree. It uses the values passed to the--manifest
and--output
subcommands.GIVEN
if-run --observe
existsWHEN I run the following command:
if-run --observe -m manifest.yml -o manifest-with-data.yml
AND manifest.yml contains the following:
THEN
if-run --observe
should createmanifest-with-inputs.yml
containing the following:AND if the command is updated to
if-run --observe -m manifest.yml
(i.e. no-o
command) then the same output data is displayed in the consoleAND if the same command is run with
if-run --observe -m manifest.yml --suppress-output
then no data is displayed to the console and no data is saved to file.Scenario 2:
if-run --regroup
is available on the command line.if-run --regroup
is available on the command line. It accepts a manifest file with input data or yaml data arriving via stdin. It applies thegroupby
logic grouping by the keys provided intree: children: child: pipeline: regroup
. It should output data to yaml file or as yaml data via stdout. It should use data passed via the--manifest
,--output
and--suppress-output
subcommands.GIVEN the feature is implemented
WHEN I run the following command:
if-run --regroup -m manifest.yml -o regrouped-manifest.yml
AND manifest.yml contains the following yaml data:
THEN
if-run --regroup
saves a file calledregrouped-manifest.yml
that contains the following:AND when the command is updated to
if-run --regroup -m manifest.yml
(i.e.-o
subcommand is omitted) the same data is displayed in the console and no file is savedAND when the command is updated to
if-run --regroup -m manifest.yml --suppress-output
no data is displayed to the console and no file is savedAND if the command is updated to
if-run --observe -m manifest.yml | if-run --regroup
thenif-regroup
operates over the yaml data piped in from stdin and displays the result to the console.Scenario 3:
if-run --regroup
can operate on a manifest with inputs that are already grouped.if-run --regroup
can operate on a manifest with inputs that are already grouped.If a manifest already has grouped
inputs
then--regroup
should recast them into a single, flat 1D array and then re-execute the grouping according to the config provided intree: children: child: pipeline: regroup
- the original grouping should be completely destroyed and replaced with the grouping defined int he manifest.GIVEN
regroup
can act on pre-grouped manifestsWHEN I run
if-run -m manifest.yml --regroup
AND
manifest.yml
contains the following (notice the values are already grouped bycloud/region
and we're configuring it to group by bothcloud/region
andcloud/instance-type
)THEN I see the following output:
Scenario 4:
if-run --compute
is available on the command line.if-run --compute
is available on the command line. It accepts a manifest file with input data OR yaml data arriving via stdin. It executes the plugins defined intree: children: child: pipeline: compute
. It should output data to yaml file or as yaml data printed to the console. It should use values passed to the--manifest
,--output
and--suppress-output
subcommands.if-run --compute should throw an exception loudly if a) there is no input data, or b) there are duplicate timestamps in input arrays (this indicates
regroup` is needed but hasn't been applied).GIVEN the feature is available
WHEN I run the command
if-run --compute -m manifest.yml -o output.yml
AND manifest.yml contains the following yaml data:
THEN
if-run --compute
should generate a new file,outputs.yml
that contains the following data:AND when the command is updated to
if-run --compute -m manifest.yml
(i.e.-o
subcommand is omitted) the same data is displayed in the console and no file is savedAND when the command is updated to
if-run --compute -m manifest.yml --suppress-output
no data is displayed to the console and no file is savedAND if the command is updated to
if-run --observe -m manifest.yml | if-run --compute
thenif-run --compute
operates over the yaml data piped in from stdin and displays the result to the console.AND if there are duplicate timestamps in an
inputs
array or theinputs
array is missing/empty, thenif-compute
should error out.Scenario 5:
if-run
with no phase arguments executes the full end-to-end pipelineif-run
with no phase arguments executes the full end-to-end pipelineif-run
should execute the full set of execution phases. It should be able to accept a manifest with noinput
data if there isobserve
config available to execute, manifests with noregroup
config as long as there are no duplicate timestamps in any inputs arrays, or manifests with noobserve
config if there is input data already available. I should be able to save intermediate representations (i.e. the outputs fromobserve
andregroup
) to file or display them in the console.if-run
should also expose--manifest
,--output
,--suppress-outputs
,save-intermediates
andshow-intermediates
commands.GIVEN
if-run
is available and refactored according to the issue descriptionWHEN I run the following command:
if-run -m manifest
AND
manifest.yml
contains the following:THEN I see the following output in the console:
AND if I update the command to
if-run -m manifest.yml -o output.yml
then the same data is saved to fileAND if I update the command to
if-run -m manifest.yml --save-intermediates ./outputs
then I get two additional files saved to theoutputs
folder:manifest-observe.yml
andmanifest-regroup.yml
and the final output is printed to the console.AND if I update the command to
if-run -m manifest.yml --save-intermediates ./outputs -o ./outputs/result.yml
then I get three files saved to theoutputs
folder:manifest-observe.yml
andmanifest-regroup.yml
and the final outputresult,yml
.Scenario 6:
if-run
should accept combinations of phase argumentsif-run
should accept combinations of phase arguments that allows a subset of [--observe
,--regroup
,--compute
] to execute without having to pipe betweenif-run
invocationsWe should be able to execute, for example
if-run --regroup --compute
to causeif-run
to skip theobserve
step but execute theregroup
andcompute
steps without hvaing to doif-run --regroup -m manifest.yml | if-run compute -s
. By passing multiple commands we can select which phases are executed without having to pipe outputs between separateif-run
invocations.GIVEN multiple commands are supported
WHEN I run the following command
THEN the regroup and compute phases are executed on whatever inputs are already available in the manifest.
AND if I update the command to
THEN the
observe
andcompute
phases should run and theregroup
stage is skipped.AND if I update the command to
THEN the
observe
andregroup
phases should run and thecompute
stage is skipped.The text was updated successfully, but these errors were encountered: