Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Impact Framework Project Updates2024-06-05 #780

Open
jmcook1186 opened this issue Jun 5, 2024 · 10 comments
Open

Impact Framework Project Updates2024-06-05 #780

jmcook1186 opened this issue Jun 5, 2024 · 10 comments
Assignees
Labels
project-update Used to flag project progress updates. Core team only

Comments

@jmcook1186
Copy link
Contributor

jmcook1186 commented Jun 5, 2024

Who

Sponsor: @jawache (GSF)
Product owner: @jmcook1186 (GSF)
Leads: @navveenb (Accenture), @srini1978 (Microsoft)

Overview

We had a good fortnight shipping some substantial updates to IF. We're especially happy to have if-diff merged with its advanced matching criteria fully implemented, a new csv-lookup generic builtin, major improvements to our logging that make it easier to diagnose and fix errors, and a big refactor of the SCI plugin code that makes it much easier to use.

@jawache and @jmcook1186 have also been scoping out and defining the next epic - you can join in the conversation on the discussion forum:

Updates

Here's the details for our most significant updates this fortnight:

Issues

We didn't really encounter any major issues, we were overall quite happy chugging through our latest epic!

Outlook

@jmcook1186 jmcook1186 added the project-update Used to flag project progress updates. Core team only label Jun 5, 2024
@jmcook1186 jmcook1186 self-assigned this Jun 5, 2024
@jmcook1186 jmcook1186 added this to IF Jun 5, 2024
@zanete zanete moved this to In Progress in IF Jun 5, 2024
@mgriffin-scottlogic
Copy link

I've been looking into the csv-lookup plugin this afternoon, I can see how that would be useful. Have you considered whether you could make query parameters optional? I was thinking I could re-implement one of my carbon hack plugins using this method but I wouldn't be able to omit cloud/usage-type from some inputs. If this didn't throw an error, I could arrange the .csv entries so that the less specific entry comes first, followed by the ones where usage-type is present. Maybe an optional ignore-missing parameter on the global config?

I also initially thought that you might be able to use this plugin to create more inputs than were initially present, which doesn't seem to be the case. Has a plugin like that been considered at all? I could see it being useful to take a .csv file and create an input for each row, with headers mapped to different parameter names etc. Could almost be an extension to the mock-observations plugin but I'm not sure I'd want to have the association that it was 'mock' data.

@jmcook1186
Copy link
Contributor Author

jmcook1186 commented Jun 10, 2024

Ah, so in your proposal, in the absence of any query parameters you would just parse all the columns from a csv into key value pairs with the column names as keys and arrays of column data as values? I guess it would be relatively straightforward to add this behaviour to the existing plugin.

Totally agree with your idea for a generic "csv-importer` that maps rows in a csv file to inputs. I've had that same idea on the mental backlog for a while but never found time to design it and get it on the board. Also agree it needs to be separate from "mock-observations".

These are ideal candidates for community contributions - we'd love to have them and would be keen to support/review but are unlikely to dedicate dev time to them at the moment.

@mgriffin-scottlogic

@mgriffin-scottlogic
Copy link

I suppose in the absence of any query parameters at all, it would just take the first row of the .csv? Which I think would be what would happen in the current code if you could end up with an empty list of queryData (ifMatchesCriteria.every() is true for an empty list).

My concrete example would be a .csv like:

vendor,service,type,replication
aws,s3,,3
aws,s3,RRS,2

With a manifest snippet:

...
  plugins:
    cloud-metadata:
      method: CSVLookup
      path: 'builtin'
      global-config:
        query:
          vendor: "cloud/vendor"
          service: "cloud/service"
          type: "cloud/usage-type"
        output: ['replication', 'storage/replication-factor']
        ignore-missing: true
...
  inputs:
    - cloud/vendor: aws
      cloud/service: s3
    - cloud/vendor: aws
      cloud/service: s3
      cloud/usage-type: RRS

So the first input would end up with storage/replication-factor: 3 and the second would have storage/replication-factor: 2.

If these are good candidates, I could definitely look into making a PR for this change, possibly the 'csv-importer' idea too at some point.

@jmcook1186
Copy link
Contributor Author

Pinging @narekhovhannisyan for opinion on the csv-lookup behaviour. I think it's generally better to error out loudly if the query parameters are specified but not found but if none are specified then returning the entire first row of the file seems reasonable.

For the csv importer I had envisaged that you would have your input data organized as a csv and the importer would parse it into yaml and add it to the manifest, with each row being an element in the inputs array, e.g.

a csv file containing this:

timestamp duration cpu-util energy
2023-07-06T00:00 1 20 5
2023-07-06T00:01 1 30 10
2023-07-06T00:02 1 40 15

becomes the following in the manifest:

      inputs:
        - timestamp: 2023-07-06T00:00
          duration: 1
          cpu-util: 20
          energy: 5
        - timestamp: 2023-07-06T00:00
          duration: 1
          cpu-util: 30
          energy: 10
        - timestamp: 2023-07-06T00:00
          duration: 1
          cpu-util: 40
          energy: 15

@mgriffin-scottlogic
Copy link

Sorry, my previous comment was still discussing the csv-lookup plugin. If it wasn't clear, I was intending to allow for optional parameters like cloud/usage-type, where there is a fallback default if only some are specified.

For a csv-importer, what you've posted is roughly what I was thinking too. It could have similar config options to the lookup plugin to allow you to take all columns, specific ones or rename them. I would also like to be able to specify constant properties that will be added to all inputs - say the .csv doesn't contain a duration but you want to specify up front that every entry represents an hour.

@jawache
Copy link
Contributor

jawache commented Jun 12, 2024

@jmcook1186 and @mgriffin-scottlogic I suspect this can just be more explicitly handled with the built-in default feature?

So in the manifest tree (perhaps even the root node) you'd add a defaults node and anything under there we automatically add to every input underneath.

I'm not sure how it works if you default to "null", maybe you default to an empty string ""?

@mgriffin-scottlogic
Copy link

Thanks Asim, I'd somehow managed to miss that feature until now, that should definitely help for both ideas.

@mgriffin-scottlogic
Copy link

Can confirm that an empty string will match against empty data and find the first entry in the list. So I can replace my idea above with:

...
  plugins:
    cloud-metadata:
      method: CSVLookup
      path: 'builtin'
      global-config:
        query:
          vendor: "cloud/vendor"
          service: "cloud/service"
          type: "cloud/usage-type"
        output: ['replication', 'storage/replication-factor']
...
  defaults:
    cloud/usage-type: ''
  inputs:
    - cloud/vendor: aws
      cloud/service: s3
    - cloud/vendor: aws
      cloud/service: s3
      cloud/usage-type: RRS

@jawache
Copy link
Contributor

jawache commented Jun 18, 2024

Whoop that's great @mgriffin-scottlogic !

@zanete zanete moved this from In Progress to Done in IF Aug 12, 2024
@bvickers7
Copy link

I came across this while searching for CSV import functionality. I'd also like to see this added so I made a new issue to cover it: #1051

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
project-update Used to flag project progress updates. Core team only
Projects
Status: Done
Development

No branches or pull requests

4 participants