Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(src): add support for appending to existing outputs #932

Merged
merged 7 commits into from
Sep 10, 2024

Conversation

jamescrowley
Copy link
Contributor

@jamescrowley jamescrowley commented Aug 4, 2024

Types of changes

  • Enhancement (project structure, spelling, grammar, formatting)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

A description of the changes proposed in the Pull Request

Closes #845.

A few areas to discuss/agree:

  1. Behaviour for group by / aggregation - I may not fully understand the implications for these types of transformations yet.
  2. Does the way the argument is passed around make sense
  3. Is it ok this living as part of the core compute logic, as opposed to one of the builtins or other extension points?

Comment on lines 281 to 258
it('computes simple tree with append and execute plugin.', async () => {
const tree = {
children: {
Copy link
Contributor Author

@jamescrowley jamescrowley Aug 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will need more tests to cover aggregation, group by behaviour - once we've agreed what the behaviour should be

@jmcook1186
Copy link
Contributor

Hi @jamescrowley - just back from vacation today - looking forward to checking this out tomorrow. Thanks for your efforts!

@jmcook1186
Copy link
Contributor

Hi @jamescrowley great point about aggregation and time-sync. I think the only behaviour that makes sense is to re-run the aggregation / time-sync over the new data and overwrite whatever is already present.

The "right" way to use append would probably be for the user to toggle time-sync and aggregation on some final manifest file after all the append actions have been taken, but in cases where a user is appending to an already-synced-or-aggregated manifest, the time-sync and/or aggregation should be repeated with the new input data, with the results overriding the original.

re groupby - in the latest IF (main branch, will be in the next release today/tomorrow) it's actually called regroup and executed as part of the IF execution flow rather than as a plugin (it's done automatically if you use if-run with no other commands, but you can isolate just that function using if-run --regroup with the necessary config in the manifest). Each time regroup is executed is flattens all the input data and regroups again from that 1D array, so I think that should just work even with --append...

It's ok having this as an amendment to the core execution logic - I don't think it makes sense as a plugin/builtin or other extension.

@jmcook1186
Copy link
Contributor

also tagging in @narekhovhannisyan for visibility

@jamescrowley
Copy link
Contributor Author

👍 I’ll rebase and add some tests to try and capture that behaviour

@jamescrowley jamescrowley reopened this Aug 9, 2024
@jamescrowley
Copy link
Contributor Author

@jmcook1186 I'm working through regroup as there's a few things to suss out there, but in the meantime, some questions re documenting the aggregation behaviour w/ tests:

  • It appears the only place I can capture the combined compute & aggregate behaviour is in the integration tests, is that right?
  • append is currently an argument to if-run, which means it is not loaded from the manifest, and thus I'm unable to write a manifest that tests this behaviour for the integration tests.

As I see it the options are:
(a) don't add any tests to explicitly capture this (or express them only in terms of stand-alone compute or aggregate behaviour)
(b) adding some tests as part of the jest test suite that hooks up the compute & aggregate steps
(c) we figure out a way to pass the append argument through the manifest-based integration tests
(d) something else?

Thanks!

@jmcook1186
Copy link
Contributor

Hi @jamescrowley - looping in @narekhovhannisyan who can give better answers to these questions!

@jamescrowley jamescrowley force-pushed the append branch 2 times, most recently from 54712f1 to 091d583 Compare August 18, 2024 08:50
@jamescrowley
Copy link
Contributor Author

@jmcook1186 @narekhovhannisyan also FYI on the regroup, I think the behaviour that makes most sense is to regroup outputs in addition to inputs when in append mode - otherwise we can't sensibly add the new outputs if the existing outputs aren't grouped. I'll push up some commits once I've ironed out the kinks but let me know if you object on principal and think I should go another way!

@jmcook1186
Copy link
Contributor

I don't object on principle - do you have an opinion @narekhovhannisyan?

appendGroup already returns acc merged anyway

Signed-off-by: James Crowley <[email protected]>
only when appending - otherwise outputs are lost if not already grouped.

Signed-off-by: James Crowley <[email protected]>
@jamescrowley jamescrowley marked this pull request as ready for review August 20, 2024 11:06
@jamescrowley
Copy link
Contributor Author

jamescrowley commented Aug 20, 2024

I've pushed proposed changes.

  1. Regroup is now fed outputs alongside inputs and merges those, when the append flag is set. I did some refactoring prior to the changes to make those easier.
  2. I've kept the 'append' preservation of inputs independent within both the compute and regroup steps, as this felt clearer (here and here).
  3. I've not copied the outputs with a deep clone like inputs - this didn't seem to be necessary but let me know if I've missed something subtle.
  4. I'm not entirely happy with the tests - in that it's not always clear the specific behaviour we're actually looking for in the expected results (open to suggestions there), but for now just tried to make the test name descriptive.
  5. There are currently no integration tests covering this (see issues previously flagged).

@narekhovhannisyan
Copy link
Member

narekhovhannisyan commented Aug 20, 2024

Hey @jmcook1186, @jamescrowley. sorry for late reply. Will check the threaд and will get back with the results.

@jmcook1186
Copy link
Contributor

Hi @jamescrowley sorry it's taking a while to get this reviewed - we've got some somewhat urgent features to ship to support a demo in a couple of weeks, but I'm keen to get into this asap. I'm planning to check out your branch and start looking into it later today.

Cheers

@jmcook1186
Copy link
Contributor

Hi @jamescrowley - i checked out the branch yesterday and am having trouble making the append feature work - could you please provide an example manifest and run command combo that works for you so I can try to replicate locally?

@jamescrowley
Copy link
Contributor Author

jamescrowley commented Sep 4, 2024

@jmcook1186 I was just running the second example in the original ticket (with original inputs, and modified mock observations for a later time), with the compute/group updated for the latest syntax

name: append
description: >-
  a complete pipeline that starts with mocked CPU utilization data and outputs
  operational carbon in gCO2eq
initialize:
  plugins:
    mock-observations:
      path: builtin
      method: MockObservations
      global-config:
        timestamp-from: '2024-03-05T00:00:04.000Z'
        timestamp-to: '2024-03-05T00:00:07.000Z'
        duration: 1
        components:
          - name: server-1
            cloud/instance-type: Standard_E64_v3
            cloud/region: westus3
        generators:
          common:
            cloud/vendor: azure
          randint:
            cpu/energy:
              min: 1
              max: 99
            mem/energy:
              min: 1
              max: 99
    sum:
      path: builtin
      method: Sum
      global-config:
        input-parameters:
          - cpu/energy
          - mem/energy
        output-parameter: energy
execution:
  command: >-
    /home/user/.npm/_npx/1bf7c3c15bf47d04/node_modules/.bin/ts-node
    /home/user/Code/if/src/index.ts -m
    manifests/examples/mock-cpu-util-to-carbon.yml -s
  environment:
    if-version: 0.4.0
    os: linux
    os-version: 5.15.0-107-generic
    node-version: 21.4.0
    date-time: 2024-06-18T14:18:44.864Z (UTC)
    dependencies:
      - '@babel/[email protected]'
      - '@babel/[email protected]'
      - '@commitlint/[email protected]'
      - '@commitlint/[email protected]'
      - '@grnsft/[email protected]'
      - '@jest/[email protected]'
      - '@types/[email protected]'
      - '@types/[email protected]'
      - '@types/[email protected]'
      - '@types/[email protected]'
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
  status: success
tree:
  pipeline:
    compute:
      - mock-observations
      - sum
    regroup:
      - cloud/region
      - name
  defaults: null
  inputs:
    - timestamp: '2024-03-05T00:00:00.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 5
      mem/energy: 10
    - timestamp: '2024-03-05T00:00:01.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 71
      mem/energy: 5
    - timestamp: '2024-03-05T00:00:02.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 36
      mem/energy: 74
  outputs:
    - timestamp: '2024-03-05T00:00:00.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 5
      mem/energy: 10
      energy: 15
    - timestamp: '2024-03-05T00:00:01.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 71
      mem/energy: 5
      energy: 76
    - timestamp: '2024-03-05T00:00:02.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 36
      mem/energy: 74
      energy: 110

and running

npm run if-run -- -m manifests/outputs/features/append.yaml -o manifests/outputs/features/re-append --append

you then get output:

name: append
description: >-
  a complete pipeline that starts with mocked CPU utilization data and outputs
  operational carbon in gCO2eq
initialize:
  plugins:
    mock-observations:
      path: builtin
      method: MockObservations
      global-config:
        timestamp-from: '2024-03-05T00:00:04.000Z'
        timestamp-to: '2024-03-05T00:00:07.000Z'
        duration: 1
        components:
          - name: server-1
            cloud/instance-type: Standard_E64_v3
            cloud/region: westus3
        generators:
          common:
            cloud/vendor: azure
          randint:
            cpu/energy:
              min: 1
              max: 99
            mem/energy:
              min: 1
              max: 99
    sum:
      path: builtin
      method: Sum
      global-config:
        input-parameters:
          - cpu/energy
          - mem/energy
        output-parameter: energy
execution:
  command: >-
    /Users/jcrowley/.npm/_npx/1bf7c3c15bf47d04/node_modules/.bin/ts-node
    /Users/jcrowley/Development/gsf/if/src/if-run/index.ts -m
    manifests/outputs/features/append.yaml -o
    manifests/outputs/features/re-append --append
  environment:
    if-version: 0.6.0
    os: macOS
    os-version: 14.6.1
    node-version: 20.16.0
    date-time: 2024-09-04T01:05:58.758Z (UTC)
    dependencies:
      - '@babel/[email protected]'
      - '@babel/[email protected]'
      - '@commitlint/[email protected]'
      - '@commitlint/[email protected]'
      - '@grnsft/[email protected]'
      - '@jest/[email protected]'
      - '@types/[email protected]'
      - '@types/[email protected]'
      - '@types/[email protected]'
      - '@types/[email protected]'
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
      - [email protected]
  status: success
tree:
  pipeline:
    compute:
      - mock-observations
      - sum
    regroup:
      - cloud/region
      - name
  defaults: null
  children:
    westus3:
      children:
        server-1:
          inputs:
            - timestamp: '2024-03-05T00:00:00.000Z'
              duration: 1
              name: server-1
              cloud/instance-type: Standard_E64_v3
              cloud/region: westus3
              cloud/vendor: azure
              cpu/energy: 5
              mem/energy: 10
            - timestamp: '2024-03-05T00:00:01.000Z'
              duration: 1
              name: server-1
              cloud/instance-type: Standard_E64_v3
              cloud/region: westus3
              cloud/vendor: azure
              cpu/energy: 71
              mem/energy: 5
            - timestamp: '2024-03-05T00:00:02.000Z'
              duration: 1
              name: server-1
              cloud/instance-type: Standard_E64_v3
              cloud/region: westus3
              cloud/vendor: azure
              cpu/energy: 36
              mem/energy: 74
          outputs:
            - timestamp: '2024-03-05T00:00:00.000Z'
              duration: 1
              name: server-1
              cloud/instance-type: Standard_E64_v3
              cloud/region: westus3
              cloud/vendor: azure
              cpu/energy: 5
              mem/energy: 10
              energy: 15
            - timestamp: '2024-03-05T00:00:01.000Z'
              duration: 1
              name: server-1
              cloud/instance-type: Standard_E64_v3
              cloud/region: westus3
              cloud/vendor: azure
              cpu/energy: 71
              mem/energy: 5
              energy: 76
            - timestamp: '2024-03-05T00:00:02.000Z'
              duration: 1
              name: server-1
              cloud/instance-type: Standard_E64_v3
              cloud/region: westus3
              cloud/vendor: azure
              cpu/energy: 36
              mem/energy: 74
              energy: 110
            - timestamp: '2024-03-05T00:00:04.000Z'
              duration: 1
              name: server-1
              cloud/instance-type: Standard_E64_v3
              cloud/region: westus3
              cloud/vendor: azure
              cpu/energy: 2
              mem/energy: 26
              energy: 28
            - timestamp: '2024-03-05T00:00:05.000Z'
              duration: 1
              name: server-1
              cloud/instance-type: Standard_E64_v3
              cloud/region: westus3
              cloud/vendor: azure
              cpu/energy: 67
              mem/energy: 27
              energy: 94
            - timestamp: '2024-03-05T00:00:06.000Z'
              duration: 1
              name: server-1
              cloud/instance-type: Standard_E64_v3
              cloud/region: westus3
              cloud/vendor: azure
              cpu/energy: 88
              mem/energy: 6
              energy: 94

which includes the additional outputs. Ideally would include this as an integration test if we can codify how we want to pass 'append' into the tests.

Let me know if you get the above, or if I've misunderstood somewhere along the way.

@jmcook1186
Copy link
Contributor

got it - thanks, yes I have consistent results to you now. Will dig in and try to understand your changes a bit more today - all seems to behave as expected so far. Then will pass to @manushak to approve.

@manushak
Copy link
Contributor

manushak commented Sep 9, 2024

Hi @jamescrowley, I've reviewed the PR, and it seems the append works only once. I've recorded a short video. In the video, it is shown that the last output data from 04-07 is missing

@jamescrowley
Copy link
Contributor Author

jamescrowley commented Sep 9, 2024

@manushak Thanks for taking a look. The current implementation appends to the outputs that already exist in the input manifest, but it does not append to any outputs that exist in a separate output manifest if supplied - that is always overwritten.

So in the example given, if you then supplied append-output.yaml as the input manifest for the second run (having modified the mock parameters) then it would behave as I believe you desired.

If the intent is to append to both outputs on the input manifest and outputs to an existing output manifest (if it exists) I’d need to do some more digging - and probably clarify with you all on other scenarios needing support here.

Hope that makes sense!

@manushak
Copy link
Contributor

manushak commented Sep 9, 2024

Yeah, you're right, @jamescrowley. I'm sorry for the misunderstanding. It works as expected.

Copy link
Contributor

@jmcook1186 jmcook1186 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this, James!

@jmcook1186 jmcook1186 merged commit 22ed978 into Green-Software-Foundation:main Sep 10, 2024
1 check passed
This was referenced Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add --append feature to IF
4 participants