Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement observe, regroup and compute as distinct execution phases #809

Closed
5 of 11 tasks
Tracked by #762
zanete opened this issue Jun 11, 2024 · 9 comments · Fixed by #914 or #930
Closed
5 of 11 tasks
Tracked by #762

Implement observe, regroup and compute as distinct execution phases #809

zanete opened this issue Jun 11, 2024 · 9 comments · Fixed by #914 or #930
Assignees
Labels
core-only This issue is reserved for the IF core team only

Comments

@zanete
Copy link

zanete commented Jun 11, 2024

Why: Sub of #762

What: Refactor IF into three distinct execution phases:

  • observe
  • re-group
  • compute

Each phase can be executed individually using a command (if-compute, if-observe, if-regroup). When run as individual commands, they accept yaml files or yaml data on stdin to operate over. They can save their outputs to yaml files or dump to console via stdout.

The three phases can be run end-to-end using a single command, if-run.

However, each phase can be run individually, by passing the individual phase flags on the command line, e.g.

if-run -m manifest.yml -o output.yaml --observe

Note that observe and regroup MODIFY INPUTS only. compute does not modify inputs, it CREATES OUTPUTS only.

Changes to manifests

The execution pipeline should also be separated into three phases inside the manifest so that what used to be a single contiguous pipeline now looks something like:

tree:
  children:
    child:
      pipeline:
        observe:
          azure-importer
        regroup:
          - cloud/instance-type
        compute:
          - cloud-metadata
          - watt-time
          - sum
          - multiply

A manifest that has not yet passed through the observe phase will have null inputs:

inputs: null

--observe

if-run --observe covers the import/data creation phase of the IF execution. It should take a manifest with no input data and return a manifest with a populated inputs array. This could happen by executing an importer plugin or mock observations.

The observe phase needs to receive a manifest file with the importer(s) configured. It will usually receive a manifest with the inputs data missing or empty.

As --observe is runnable in isolation, it should apture the state of the manifest after the plugins int he observe pipeline have completed and save it to the yaml filepath specified to the --output / -o command. It can only output yaml data. If no output command is provided, it should display the output data to the console. However, we should also support a --suppress-output command that prevents the data being displayed to the console.

The importers will be initialized in precisely the same way as they are today (no separation into phases required in the initialize block) but the plugins to execute in the observe phase are those defined in the tree: children: child: pipeline: observe: block in the manifest.

The following command should enrich a manifest with no inputs and save a copy with input data:

if-run --observe -m manifest.yml -o manifest-with-data.yml

--regroup

if-run --regroup should apply the logic from the existing group-by plugin but apply it as an IF core feature rather than a plugin. The groupby plugin should be deleted as part of this task. The regroup feature will get its arguments from the tree: children: child: pipeline:regroup: config in the manifest. This block should include an array of keys to regroup by, for example:

tree:
  chiildren:
    child:
      pipeline:
        observe:
          - mock-obsrvations
        regroup:
          - cloud/instance-type

regroup will see the cloud/instance-type value in the manifest and regroup the input values using the cloud/instance-type values found in the inputs array, equivalent to this in today's group-by plugin:

      config:
        group-by:
          - cloud/instance-type

regroup needs to operate on yaml data where input data exists. This can come from a file where the input data was pre-populated by if-run --observe some time in the past, or can come from a fresh if-run --observe, i.e. all the following should work:

if-run --regroup -m manifest-with-input-data.yml -o regrouped-manifest.yml
if-run --observe -m manifest-with-no-data.yml | if-run --regroup -o regrouped-manifest.yml
if-run --observe -m manifest-with-no-data.yml -o manifest-with-input-data.yml && if-run -m manifest-with-input-data.yml --regroup -o regrouped-manifest.yml

--compute

if-run --compute should execute all the plugins in the compute section of the tree: children:child: pipeline: compute block in a manifest. For example, given the following information:

tree:
  pipeline:
    observe:
      mock-observations
    regroup:
      cloud/instance-metadata
    compute:
      - sum
      - multiply
      - divide
      - interpolate

--compute will execute the sum, multiply, divide and interpolate plugins over the input data in sequence according to the config provided in the initialize block.

--compute needs to accept yaml data that has a populated inputs array. It will create an outputs array.

Both of the following commands should worK

if-run --compute -m regrouped-manifest.yml -o output.yml
if-run --regroup ungrouped-manifest | if-run --compute

if-run

Simply using if-run with none of the execution phase flags should run an entire execution pipeline that includes the observe, regroup and compute phases.

if-run should be able to optionally save the final output to file or display in the console using the --output/o command like today. If so, it is the final output from --compute that gets outputted. However, it should also be possible to save the intermediate representations (i.e. the outputs from --observe and --regreoup by specifying this on the command line.

if-run -m manifest.yml -o output.yml --save-intermediates outputs

will save the outputs from --observe and --regroup to the outputs folder with the filenames manifest-observe.yml and manifest-regroup.yml. The output folder name is provided to --save-intermediates and the filenames are simply the given manifest name with the -observe or -regroup suffix appended.

You can also do

if-run -o output.yml --display-intermediates

to display the results from if-observe and if-regroup in the console but not save them to file.

if-run should include some validation logic that determines which phases to run. If there is no regroup section or the regroup section of the manifest is empty, then the regroup phase should be skipped. compute should always be run.

manifest validation

Each execution phase now has its own specific requirements for the content of a manifest, so they should do their own independent validation steps. The current validation of the context is common to all three phases, but observe requires there to be no inputs whereas regroup and compute require inputs to be present and correctly formatted.

Scope of work:

Acceptance criteria

Scenario 1: if-run --observe can be run on the command line

  • if-run --observe can be run on the command line
    --observe is a flag that modifies the behaviour of if-run so that it only executes the observe phase. It expects to operate on a manifest without input data and can either return its result to the console or save to file (this is just the normal if-run behaviour). The result will always be yaml data. The command enriches the inputs array but never creates outputs. The command executes plugins initialized in the initialize section of the manifest and listed in the tree: pipeline: observe section of each node in the tree. It uses the values passed to the --manifest and --output subcommands.

    GIVEN if-run --observe exists
    WHEN I run the following command:
    if-run --observe -m manifest.yml -o manifest-with-data.yml
    AND manifest.yml contains the following:

name: demo
description: demo for observe feat
tags:
initialize:
  plugins:
    mock-observations:
      kind: plugin
      method: MockObservations
      path: "builtin"
      global-config:
        timestamp-from: 2023-07-06T00:00
        timestamp-to: 2023-07-06T00:01
        duration: 60
        components:
          - cloud/instance-type: A1
          - cloud/instance-type: B1
        generators:
          common:
            region: uk-west
            common-key: common-val
          randint:
            cpu/utilization:
              min: 1
              max: 99
            memory/utilization:
              min: 1
              max: 99
tree:
  children:
    child:
      pipeline:
        observe:
          - mock-observations
      inputs: null

THEN if-run --observe should create manifest-with-inputs.yml containing the following:

name: mock-observation-demo
description: a manifest demonstrating how to use the mock observations feature
tags: null
initialize:
  plugins:
    mock-observations:
      path: builtin
      method: MockObservations
      global-config:
        timestamp-from: 2023-07-06T00:00
        timestamp-to: 2023-07-06T00:10
        duration: 60
        components:
          - cloud/instance-type: A1
          - cloud/instance-type: B1
        generators:
          common:
            region: uk-west
            common-key: common-val
          randint:
            cpu/utilization:
              min: 1
              max: 99
            memory/utilization:
              min: 1
              max: 99
execution:
  command: >-
    /home/user/.npm/_npx/1bf7c3c15bf47d04/node_modules/.bin/ts-node
    /home/user/Code/if/src/index.ts -m
    manifest.yml -o manifest-with-inputs.yml
  environment:
    if-version: 0.3.4
    os: linux
    os-version: 5.15.0-107-generic
    node-version: 21.4.0
    date-time: 2024-06-12T10:42:11.533Z (UTC)
    dependencies:
      - '@babel/[email protected]'
      - '@babel/[email protected]'
      - '@commitlint/[email protected]'
      - '@commitlint/[email protected]'
      - '@jest/[email protected]'
      - '@types/[email protected]'
      - '@types/[email protected]'
      - '@types/[email protected]'
      - '@types/[email protected]'
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
  status: success
tree:
  children:
    child:
      pipeline:
        - mock-observations
      inputs: 
        - timestamp: '2023-07-06T00:00:00.000Z'
          duration: 60
          cloud/instance-type: A1
          region: uk-west
          common-key: common-val
          cpu/utilization: 46
          memory/utilization: 24
        - timestamp: '2023-07-06T00:00:00.000Z'
          duration: 60
          cloud/instance-type: B1
          region: uk-west
          common-key: common-val
          cpu/utilization: 81
          memory/utilization: 95
      outputs:

AND if the command is updated to if-run --observe -m manifest.yml (i.e. no -o command) then the same output data is displayed in the console

AND if the same command is run with if-run --observe -m manifest.yml --suppress-output then no data is displayed to the console and no data is saved to file.

Scenario 2: if-run --regroup is available on the command line.

  • if-run --regroup is available on the command line. It accepts a manifest file with input data or yaml data arriving via stdin. It applies the groupby logic grouping by the keys provided in tree: children: child: pipeline: regroup. It should output data to yaml file or as yaml data via stdout. It should use data passed via the --manifest, --output and --suppress-output subcommands.

    GIVEN the feature is implemented
    WHEN I run the following command:
    if-run --regroup -m manifest.yml -o regrouped-manifest.yml
    AND manifest.yml contains the following yaml data:

name: regroup demo
description: 
initialize:
  plugins:
tree:
  children:
    child:
      pipeline:
        observe:
        regroup:
          - cloud/region
          - cloud/instance-type
      inputs:
        - timestamp: 2023-07-06T00:00
          duration: 300
          cloud/instance-type: A1
          cloud/region: uk-west
          cpu/utilization: 99
        - timestamp: 2023-07-06T05:00
          duration: 300
          cloud/instance-type: A1
          cloud/region: uk-west
          cpu/utilization: 23
        - timestamp: 2023-07-06T10:00
          duration: 300
          cloud/instance-type: A1
          cloud/region: uk-west
          cpu/utilization: 12
        - timestamp: 2023-07-06T00:00 # note this time restarts at the start timstamp
          duration: 300
          cloud/instance-type: B1
          cloud/region: uk-west
          cpu/utilization: 11
        - timestamp: 2023-07-06T05:00
          duration: 300
          cloud/instance-type: B1
          cloud/region: uk-west
          cpu/utilization: 67
        - timestamp: 2023-07-06T10:00
          duration: 300
          cloud/instance-type: B1
          cloud/region: uk-west
          cpu/utilization: 1

THEN if-run --regroup saves a file called regrouped-manifest.yml that contains the following:

name: regroup demo
description: 
initialize:
  plugins:
execution:
  command: >-
    /home/user/.npm/_npx/1bf7c3c15bf47d04/node_modules/.bin/ts-node
    /home/user/Code/if/src/index.ts -m manifest.yml -o regrouped-manifest.yml
  environment:
    if-version: 0.3.4
    os: linux
    os-version: 5.15.0-107-generic
    node-version: 21.4.0
    date-time: 2024-06-12T11:03:12.123Z (UTC)
    dependencies:
      - '@babel/[email protected]'
      - '@babel/[email protected]'
      - '@commitlint/[email protected]'
      - '@commitlint/[email protected]'
      - '@jest/[email protected]'
      - '@types/[email protected]'
      - '@types/[email protected]'
      - '@types/[email protected]'
      - '@types/[email protected]'
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
  status: success
tree:
  children:
    child:
      pipeline:
        observe:
        regroup:
            - cloud/region
            - cloud/instance-type
      children:
        uk-west:
          children:
            A1:
              inputs:
                - timestamp: 2023-07-06T00:00
                  duration: 300
                  cloud/instance-type: A1
                  cloud/region: uk-west
                  cpu/utilization: 99
                - timestamp: 2023-07-06T05:00
                  duration: 300
                  cloud/instance-type: A1
                  cloud/region: uk-west
                  cpu/utilization: 23
                - timestamp: 2023-07-06T10:00
                  duration: 300
                  cloud/instance-type: A1
                  cloud/region: uk-west
                  cpu/utilization: 12
            B1:
              inputs:
                - timestamp: 2023-07-06T00:00
                  duration: 300
                  cloud/instance-type: B1
                  cloud/region: uk-west
                  cpu/utilization: 11
                - timestamp: 2023-07-06T05:00
                  duration: 300
                  cloud/instance-type: B1
                  cloud/region: uk-west
                  cpu/utilization: 67
                - timestamp: 2023-07-06T10:00
                  duration: 300
                  cloud/instance-type: B1
                  cloud/region: uk-west
                  cpu/utilization: 1

AND when the command is updated to if-run --regroup -m manifest.yml (i.e. -o subcommand is omitted) the same data is displayed in the console and no file is saved

AND when the command is updated to if-run --regroup -m manifest.yml --suppress-output no data is displayed to the console and no file is saved

AND if the command is updated to if-run --observe -m manifest.yml | if-run --regroup then if-regroup operates over the yaml data piped in from stdin and displays the result to the console.

Scenario 3: if-run --regroup can operate on a manifest with inputs that are already grouped.

  • if-run --regroup can operate on a manifest with inputs that are already grouped.
    If a manifest already has grouped inputs then --regroup should recast them into a single, flat 1D array and then re-execute the grouping according to the config provided in tree: children: child: pipeline: regroup - the original grouping should be completely destroyed and replaced with the grouping defined int he manifest.

    GIVEN regroup can act on pre-grouped manifests
    WHEN I run if-run -m manifest.yml --regroup
    AND manifest.yml contains the following (notice the values are already grouped by cloud/region and we're configuring it to group by both cloud/region and cloud/instance-type)

name: groupby
description: successful path
initialize:
  plugins:
    "sum":
      path: "builtin"
      method: Sum
      global-config:
        input-parameters:
          - cpu/energy
          - network/energy
        output-parameter: energy
tree:
  children:
    my-app:
      pipeline:
        observe:
        regroup:
          - cloud/instance-type
          - cloud/region
        compute:
      children:
        uk-west:
          inputs:
            - timestamp: 2023-07-06T00:00
              duration: 300
              cloud/instance-type: A1
              cloud/region: uk-west
              cpu/utilization: 99
            - timestamp: 2023-07-06T05:00
              duration: 300
              cloud/instance-type: A1
              cloud/region: uk-west
              cpu/utilization: 23
            - timestamp: 2023-07-06T10:00
              duration: 300
              cloud/instance-type: A1
              cloud/region: uk-west
              cpu/utilization: 12
            - timestamp: 2023-07-06T00:00
              duration: 300
              cloud/instance-type: B1
              cloud/region: uk-west
              cpu/utilization: 11
            - timestamp: 2023-07-06T05:00
              duration: 300
              cloud/instance-type: B1
              cloud/region: uk-west
              cpu/utilization: 67
            - timestamp: 2023-07-06T10:00
              duration: 300
              cloud/instance-type: B1
              cloud/region: uk-west
              cpu/utilization: 1
        uk-east:
          inputs:
            - timestamp: 2023-07-06T00:00
              duration: 300
              cloud/instance-type: A1
              cloud/region: uk-east
              cpu/utilization: 9
            - timestamp: 2023-07-06T05:00
              duration: 300
              cloud/instance-type: A1
              cloud/region: uk-east
              cpu/utilization: 23
            - timestamp: 2023-07-06T10:00
              duration: 300
              cloud/instance-type: A1
              cloud/region: uk-east
              cpu/utilization: 12
            - timestamp: 2023-07-06T00:00
              duration: 300
              cloud/instance-type: B1
              cloud/region: uk-east
              cpu/utilization: 11
            - timestamp: 2023-07-06T05:00
              duration: 300
              cloud/instance-type: B1
              cloud/region: uk-east
              cpu/utilization: 67
            - timestamp: 2023-07-06T10:00
              duration: 300
              cloud/instance-type: B1
              cloud/region: uk-east
              cpu/utilization: 1

THEN I see the following output:

name: groupby
description: successful path
initialize:
  plugins:
    "sum":
      path: "builtin"
      method: Sum
      global-config:
        input-parameters:
          - cpu/energy
          - network/energy
        output-parameter: energy
execution:
  command: >-
    /home/user/.npm/_npx/1bf7c3c15bf47d04/node_modules/.bin/ts-node
    /home/user/Code/if/src/index.ts -m manifests/plugins/groupby/success.yml -s
  environment:
    if-version: 0.4.0
    os: linux
    os-version: 5.15.0-107-generic
    node-version: 21.4.0
    date-time: 2024-06-19T09:06:57.342Z (UTC)
    dependencies:
      - '@babel/[email protected]'
      - '@babel/[email protected]'
      - '@commitlint/[email protected]'
      - '@commitlint/[email protected]'
      - '@grnsft/[email protected]'
      - '@jest/[email protected]'
      - '@types/[email protected]'
      - '@types/[email protected]'
      - '@types/[email protected]'
      - '@types/[email protected]'
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
  status: success
tree:
  children:
    my-app:
      pipeline:
        observe:
        regroup:
          - cloud/instance-type
          - cloud/region
        compute:
      children:
        uk-west:
          children:
            A1:
              inputs:
                - timestamp: 2023-07-06T00:00
                  duration: 300
                  cloud/instance-type: A1
                  cloud/region: uk-west
                  cpu/utilization: 99
                - timestamp: 2023-07-06T05:00
                  duration: 300
                  cloud/instance-type: A1
                  cloud/region: uk-west
                  cpu/utilization: 23
                - timestamp: 2023-07-06T10:00
                  duration: 300
                  cloud/instance-type: A1
                  cloud/region: uk-west
                  cpu/utilization: 12
            B1:
              inputs:
                - timestamp: 2023-07-06T00:00
                  duration: 300
                  cloud/instance-type: B1
                  cloud/region: uk-west
                  cpu/utilization: 11
                - timestamp: 2023-07-06T05:00
                  duration: 300
                  cloud/instance-type: B1
                  cloud/region: uk-west
                  cpu/utilization: 67
                - timestamp: 2023-07-06T10:00
                  duration: 300
                  cloud/instance-type: B1
                  cloud/region: uk-west
                  cpu/utilization: 1
        uk-east:
          children:
            A1:
              inputs:
                - timestamp: 2023-07-06T00:00
                  duration: 300
                  cloud/instance-type: A1
                  cloud/region: uk-east
                  cpu/utilization: 9
                - timestamp: 2023-07-06T05:00
                  duration: 300
                  cloud/instance-type: A1
                  cloud/region: uk-east
                  cpu/utilization: 23
                - timestamp: 2023-07-06T10:00
                  duration: 300
                  cloud/instance-type: A1
                  cloud/region: uk-east
                  cpu/utilization: 12
            B1:
              inputs:
                - timestamp: 2023-07-06T00:00
                  duration: 300
                  cloud/instance-type: B1
                  cloud/region: uk-east
                  cpu/utilization: 11
                - timestamp: 2023-07-06T05:00
                  duration: 300
                  cloud/instance-type: B1
                  cloud/region: uk-east
                  cpu/utilization: 67
                - timestamp: 2023-07-06T10:00
                  duration: 300
                  cloud/instance-type: B1
                  cloud/region: uk-east
                  cpu/utilization: 1

Scenario 4: if-run --compute is available on the command line.

  • if-run --compute is available on the command line. It accepts a manifest file with input data OR yaml data arriving via stdin. It executes the plugins defined in tree: children: child: pipeline: compute. It should output data to yaml file or as yaml data printed to the console. It should use values passed to the --manifest, --output and --suppress-output subcommands. if-run --compute should throw an exception loudly if a) there is no input data, or b) there are duplicate timestamps in input arrays (this indicates regroup` is needed but hasn't been applied).

    GIVEN the feature is available
    WHEN I run the command if-run --compute -m manifest.yml -o output.yml
    AND manifest.yml contains the following yaml data:

name: demo
description: 
tags:
initialize:
  plugins:
    "sum":
      path: "builtin"
      method: Sum
      global-config:
        input-parameters:
          - cpu/energy
          - network/energy
        output-parameter: energy-sum
    "coefficient":
      path: "builtin"
      method: Coefficient
      global-config:
        input-parameter: energy
        coefficient: 2
        output-parameter: energy-doubled
    "multiply":
      path: "builtin"
      method: Multiply
      global-config:
        input-parameters: ["cpu/utilization", "duration"]
        output-parameter: "cpu-times-duration"
tree:
  children:
    child-1:
      pipeline:
        observe:
        compute: 
          - sum
          - coefficient
          - multiply
      defaults:
        cpu/thermal-design-power: 100
      inputs:
        - timestamp: "2023-12-12T00:00:00.000Z"
          cloud/instance-type: A1
          cloud/region: uk-west
          duration: 1
          cpu/utilization: 50
          cpu/energy: 20
          network/energy: 10
          energy: 5

THEN if-run --compute should generate a new file, outputs.yml that contains the following data:

name: generics
description: >-
  a pipeline that does arbitrary calculations using our generic arithmetic
  builtins
tags: null
initialize:
  plugins:
    sum:
      path: builtin
      method: Sum
      global-config:
        input-parameters:
          - cpu/energy
          - network/energy
        output-parameter: energy-sum
    coefficient:
      path: builtin
      method: Coefficient
      global-config:
        input-parameter: energy
        coefficient: 2
        output-parameter: energy-doubled
    multiply:
      path: builtin
      method: Multiply
      global-config:
        input-parameters:
          - cpu/utilization
          - duration
        output-parameter: cpu-times-duration
execution:
  status: fail
  command: >-
    /home/user/.npm/_npx/1bf7c3c15bf47d04/node_modules/.bin/ts-node
    /home/user/Code/if/src/index.ts -m manifest.yml -o output.yml
  environment:
    if-version: 0.3.4
    os: linux
    os-version: 5.15.0-107-generic
    node-version: 21.4.0
    date-time: 2024-06-12T11:15:18.738Z (UTC)
    dependencies:
      - '@babel/[email protected]'
      - '@babel/[email protected]'
      - '@commitlint/[email protected]'
      - '@commitlint/[email protected]'
      - '@jest/[email protected]'
      - '@types/[email protected]'
      - '@types/[email protected]'
      - '@types/[email protected]'
      - '@types/[email protected]'
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
  error: 'InputValidationError: cpu/energy is missing from the input array.'
tree:
  children:
    child-1:
      pipeline:
        observe:
        compute:
          - sum
          - coefficient
          - multiply
      config: null
      defaults:
        cpu/thermal-design-power: 100
      inputs:
        - timestamp: '2023-12-12T00:00:00.000Z'
          cloud/instance-type: A1
          cloud/region: uk-west
          duration: 1
          cpu/utilization: 50
          cpu/energy: null
          network/energy: 10
          energy: 5
      outputs:
        - timestamp: '2023-12-12T00:00:00.000Z'
          cloud/instance-type: A1
          cloud/region: uk-west
          duration: 1
          cpu/utilization: 50
          cpu/energy: 20
          network/energy: 10
          energy: 5
          cpu/thermal-design-power: 100
          energy-sum: 30
          energy-doubled: 10
          cpu-times-duration: 50

AND when the command is updated to if-run --compute -m manifest.yml (i.e. -o subcommand is omitted) the same data is displayed in the console and no file is saved

AND when the command is updated to if-run --compute -m manifest.yml --suppress-output no data is displayed to the console and no file is saved

AND if the command is updated to if-run --observe -m manifest.yml | if-run --compute then if-run --compute operates over the yaml data piped in from stdin and displays the result to the console.

AND if there are duplicate timestamps in an inputs array or the inputs array is missing/empty, then if-compute should error out.

Scenario 5: if-run with no phase arguments executes the full end-to-end pipeline

  • if-run with no phase arguments executes the full end-to-end pipeline

if-run should execute the full set of execution phases. It should be able to accept a manifest with no input data if there is observe config available to execute, manifests with no regroup config as long as there are no duplicate timestamps in any inputs arrays, or manifests with no observe config if there is input data already available. I should be able to save intermediate representations (i.e. the outputs from observe and regroup) to file or display them in the console. if-run should also expose --manifest , --output, --suppress-outputs, save-intermediates and show-intermediates commands.

GIVEN if-run is available and refactored according to the issue description
WHEN I run the following command:

if-run -m manifest

AND manifest.yml contains the following:

name: demo
description: 
tags:
initialize:
  plugins:
    mock-observations:
      kind: plugin
      method: MockObservations
      path: "builtin"
      global-config:
        timestamp-from: 2023-07-06T00:00
        timestamp-to: 2023-07-06T00:01
        duration: 60
        components:
          - cloud/instance-type: A1
          - cloud/instance-type: B1
        generators:
          common:
            region: uk-west
            common-key: common-val
          randint:
            cpu/utilization:
              min: 1
              max: 99
            memory/utilization:
              min: 1
              max: 99
    sum:
      path: "builtin"
      method: Sum
      global-config:
        input-parameters:
          - cpu/utilization
          - memory/utilization
        output-parameter: util-sum
tree:
  children:
    child:
      pipeline:
        observe:
          - mock-observations
        regroup:
          - cloud/instance-type
        compute:
          - sum
      inputs: null

THEN I see the following output in the console:

name: demo
description: 
tags:
initialize:
  plugins:
    mock-observations:
      kind: plugin
      method: MockObservations
      path: "builtin"
      global-config:
        timestamp-from: 2023-07-06T00:00
        timestamp-to: 2023-07-06T00:01
        duration: 60
        components:
          - cloud/instance-type: A1
          - cloud/instance-type: B1
        generators:
          common:
            region: uk-west
            common-key: common-val
          randint:
            cpu/utilization:
              min: 1
              max: 99
            memory/utilization:
              min: 1
              max: 99
    sum:
      path: "builtin"
      method: Sum
      global-config:
        input-parameters:
          - cpu/utilization
          - memory/utilization
        output-parameter: util-sum
tree:
  children:
    child:
      pipeline:
        observe:
          - mock-observations
        regroup:
          - cloud/instance-type
        compute:
          - sum
tree:
  children:
    my-app:
      pipeline:
        - group-by
      config:
        group-by:
          group:
            - cloud/region
            - cloud/instance-type
      children:
        uk-west:
          children:
            A1:
              inputs:
                - timestamp: 2023-07-06T00:00
                  duration: 300
                  cloud/instance-type: A1
                  cloud/region: uk-west
                  cpu/utilization: 99
                  memory/utilization: 50
                - timestamp: 2023-07-06T05:00
                  duration: 300
                  cloud/instance-type: A1
                  cloud/region: uk-west
                  cpu/utilization: 23
                  memory/utilization: 50
                - timestamp: 2023-07-06T10:00
                  duration: 300
                  cloud/instance-type: A1
                  cloud/region: uk-west
                  cpu/utilization: 12
                  memory/utilization: 50
              outputs:
                - timestamp: 2023-07-06T00:00
                  duration: 300
                  cloud/instance-type: A1
                  cloud/region: uk-west
                  cpu/utilization: 99
                  memory/utilization: 50
                  util-sum: 149
                - timestamp: 2023-07-06T05:00
                  duration: 300
                  cloud/instance-type: A1
                  cloud/region: uk-west
                  cpu/utilization: 23
                  memory/utilization: 50
                  util-sum: 73
                - timestamp: 2023-07-06T10:00
                  duration: 300
                  cloud/instance-type: A1
                  cloud/region: uk-west
                  cpu/utilization: 12
                  memory/utilization: 50
                  util-sum: 62
            B1:
              inputs:
                - timestamp: 2023-07-06T00:00
                  duration: 300
                  cloud/instance-type: B1
                  cloud/region: uk-west
                  cpu/utilization: 11
                  memory/utilization: 50
                  util-sum: 61
                - timestamp: 2023-07-06T05:00
                  duration: 300
                  cloud/instance-type: B1
                  cloud/region: uk-west
                  cpu/utilization: 67
                  memory/utilization: 50
                  util-sum: 117
                - timestamp: 2023-07-06T10:00
                  duration: 300
                  cloud/instance-type: B1
                  cloud/region: uk-west
                  cpu/utilization: 1
                  memory/utilization: 50
                  util-sum: 51

AND if I update the command to if-run -m manifest.yml -o output.yml then the same data is saved to file

AND if I update the command to if-run -m manifest.yml --save-intermediates ./outputs then I get two additional files saved to the outputs folder: manifest-observe.yml and manifest-regroup.yml and the final output is printed to the console.

AND if I update the command to if-run -m manifest.yml --save-intermediates ./outputs -o ./outputs/result.yml then I get three files saved to the outputs folder: manifest-observe.yml and manifest-regroup.yml and the final output result,yml.

Scenario 6: if-run should accept combinations of phase arguments

  • if-run should accept combinations of phase arguments that allows a subset of [--observe, --regroup, --compute] to execute without having to pipe between if-run invocations

We should be able to execute, for example if-run --regroup --compute to cause if-run to skip the observe step but execute the regroup and compute steps without hvaing to do if-run --regroup -m manifest.yml | if-run compute -s. By passing multiple commands we can select which phases are executed without having to pipe outputs between separate if-run invocations.

GIVEN multiple commands are supported
WHEN I run the following command

if-run -m manifest.yml --regroup --compute -s

THEN the regroup and compute phases are executed on whatever inputs are already available in the manifest.

AND if I update the command to

if-run -m manifest.yml --observe --compute -s

THEN the observe and compute phases should run and the regroup stage is skipped.

AND if I update the command to

if-run -m manifest.yml --observe --regroup -s

THEN the observe and regroup phases should run and the compute stage is skipped.

@zanete zanete mentioned this issue Jun 11, 2024
6 tasks
@zanete zanete moved this to In Design in IF Jun 11, 2024
@zanete zanete added the draft The issue is still being written, no need to respond or action on anything. label Jun 11, 2024
@zanete zanete added this to the Inputs and Outputs milestone Jun 11, 2024
@zanete zanete added the core-only This issue is reserved for the IF core team only label Jun 11, 2024
@zanete zanete changed the title Refactor IF into three distinct execution phases Implement observe, group and compute as distinct execution phases Jun 11, 2024
@zanete zanete changed the title Implement observe, group and compute as distinct execution phases Implement observe, group and compute as distinct execution phases Jun 11, 2024
@jmcook1186 jmcook1186 changed the title Implement observe, group and compute as distinct execution phases Implement observe, regroup and compute as distinct execution phases Jun 12, 2024
@jmcook1186 jmcook1186 moved this from In Design to In Refinement in IF Jun 12, 2024
@jmcook1186 jmcook1186 removed the draft The issue is still being written, no need to respond or action on anything. label Jun 12, 2024
@jmcook1186 jmcook1186 moved this from In Refinement to Ready in IF Jun 12, 2024
@zanete
Copy link
Author

zanete commented Jun 13, 2024

@jawache please review the AC to ensure you're in alignment

@jawache
Copy link
Contributor

jawache commented Jun 18, 2024

@jmcook1186 re: if-run should include some validation logic that determines which phases to run. Specifically, if input data is already provided int he manifest file provided to if-run then the observe phase should be skipped.

I think it's fine (and expected) to run the observe step even if it has inputs, the intention would be to gather observations again and run with new observations.

@jawache
Copy link
Contributor

jawache commented Jun 18, 2024

@jmcook1186 first AC with observe, the inputs should be populated not outputs Screenshot_20240618-212916.png

@jawache
Copy link
Contributor

jawache commented Jun 18, 2024

"AND if the same command is run with if-run --observe -m manifest.yml --suppress-output then no data is displayed to the console and no data is saved to file." - I'm unsure what the rationale is for surpess-output?

@jawache
Copy link
Contributor

jawache commented Jun 18, 2024

@jmcook1186 all good apart from my comments above, very thorough!

Only additional is that we should be able to run say regroup and compute in one step. So if-run -regroup -compute should be able to run both steps in one go.

In fact perhaps the right flag might be -skip-observe (-skip-regoup etc...) i think the most common use case here is just to skip the observe step and run the other two.

@jawache
Copy link
Contributor

jawache commented Jun 18, 2024

Oh and one other point, regroup should be able to run on a manifest file with previously grouped inputs. It should first flatten all inputs into a 1d array and regroup from that baseline

@jmcook1186
Copy link
Contributor

Thanks @jawache - agree on all points, updating ticket now.

The rationale for suppress-output is really just for developers to be able to run if-run and see logs and error messages without having to scroll up through lots of output data and to have some finer grained control over what is printed to the console in each run. It's not at all essential to include - you can just redirect stdout to /dev/null manually where needed, but I think it would be convenient for developers to have this option.

@zanete
Copy link
Author

zanete commented Jul 2, 2024

@narekhovhannisyan please deep-dive to confirm solution is clear and let me know once you have, I will proceed to split it into 5 subtickets:

  • Global/enablement of the phased execution
  • Scenario 1
  • Scenario 2 & 3
  • Scenario 4
  • Scenario 5 & 6

@zanete
Copy link
Author

zanete commented Jul 24, 2024

Waiting to get all the manisfests in the same style to fully test the phased execution #812

@narekhovhannisyan narekhovhannisyan linked a pull request Aug 2, 2024 that will close this issue
9 tasks
@github-project-automation github-project-automation bot moved this from Testing to Done in IF Aug 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-only This issue is reserved for the IF core team only
Projects
Status: Done
5 participants