Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agree solution to units.yaml problem #403

Closed
jmcook1186 opened this issue Jan 19, 2024 · 6 comments
Closed

Agree solution to units.yaml problem #403

jmcook1186 opened this issue Jan 19, 2024 · 6 comments
Assignees

Comments

@jmcook1186
Copy link
Contributor

Story

As a developer I want to be able to create new model plugins independently without having to update the IF core. Today this is not possible for plugins that create new parameters.

Rationale

We currently use a file, units.yml to define expected units and parameters names (and we also use this to define the method we should use to aggregate the values in our aggregation feature). We also use this file to do some internal validation. In order to improve the DX we have to find a way to remove our reliance on units.yml. however, units.yml provides improtant protection against unit errors propagating through model pipelines, and also takes away some burden from plugin developers having to handle all kinds of unit corrections.

Implementation details

This issue is only intended to discuss and agree on a fix for this problem. Once a solution is properly specified we will create a new issue for the actual implementation.

Priority

5/5

Scope

this issue is not for any actual code changes

Size

3 days

What does "done" look like?

Team agree on sensible, maximally future-proof solution

Does this require updates to documentation or other materials??

not at this stage

What testing is required?

none

Is this a known/expected update?

expected since start of year, discussed at several meetings including IF weekly call and internal meetings

@jmcook1186
Copy link
Contributor Author

Initial proposal:

  1. Continue to maintain units.yml, but move it into its own repository where the community can maintain consensus over the standard set of units. Load this as a dependency in IF.

  2. Plugin developers can add new (as in, not yet known to IF, not included in units.yml) parameters by providing them in a new field in the manifest file. This means units.yaml is writeable but append-only. These new params are added to the in-memory js object representation of units.yml at runtime. This protects against accidentally providing unexpected inputs into existing models which could lead to csubtle errors propagating through the pipeline (i.e the consensus units are fixed and cannot be changed).

  3. Add an --unsafe-mode CLI flag that allows units.yaml overwriting. This clearly signals to plugin developers that they are not conforming to accepted unit norms, but expert users might choose to do this if they want to create outside of the expected boundaries.

These changes would also require us to rethink some of our validation logic, and we would need to audit for additional side-effects.

@narekhovhannisyan @jawache @manushak

@jmcook1186 jmcook1186 changed the title Finalize solution to units.yaml problem Agree solution to units.yaml problem Jan 19, 2024
@mads6879
Copy link

mads6879 commented Jan 21, 2024

One option can be to provide a command line tool which provides ability to the developers to add a new unit to IF.

Proposed folder structure

config/units/<unit_000x.yml files> where x is an incremental integer between 1 to 9999.

Rationale behind the folder structure

It is very likely that there may not be that many units to be added for IF but this approach keeps room for future growth.
The single units.yml has potential to end up growing as a monolith.
Also in future, there may be a need to keep units of one type in one file e.g. need to load all memory related units only

CLI utility

units-parser

Arguments

--name name --description "description" --unit unit --aggregation allowed_aggregates

Process

To begin with, the existing units file config/units.yml would be stored as config/units/unit_0001.yml

The developer who wants to add new unit executes a CLI command
e.g.
units-parser --name water --description "amount of water consumed" --unit L --aggregation sum

The units-parser validates the new proposed unit against the unit_0001.yml (which is copy of the current units.yml file).
Water is a new unit and not duplicate (not currently known to IF) hence it can append 'Water' to unit_0001.yml.
If the size of the existing unit_0001.yml has crossed certain threshold (x MB) then a new file unit_0002.yml is created and 'Water' is added to it.
All the unit_000x.yml files under the config/units folder would be loaded in memory so they are available to the plugins at runtime.

The changes to the existing unit_000x.yml files would follow the same rules that are applicable for approving commits/merges to the IF repo (which can also include consensus on the new units added).

If a developer tries to add already existing unit, e.g.
units-parser --name cpu-util --description "refers to CPU utilization" --unit percentage --aggregation sum
In this case, the unit is a duplicate and an error would be thrown when the CLI command is executed.
Even though the aggregation specified (sum) is different to the existing one (avg), the entry would be treated as a duplicate and will generate error.

To provide flexibility to the developers, the CLI tool may contain an update command which can be used to update description and aggregation, however name and unit would not be editable as they are immutable.
If changing aggregation breaks existing plugins then perhaps aggregation can be immutable as well.

Updated

Following up on the above example, if the request is to add same unit cpu-unit but with another name say cpuunit, the tool can apply some intelligent logic and throw a warning and suggest similar units that can be used.
e.g. In the above case, it may say
'the new unit name looks similar to other unit already in IF. Do you want to use existing 'cpu-unit' instead?'
Similar validation can be applied to --unit, when trying to add mL when L is already present for Water, it can show warning and suggest
'There is already L unit in IF for water, do you want to use that instead of adding mL'
However this may be fairly complicated very quickly so need not be part of MVP.

@jmcook1186
Copy link
Contributor Author

Hi @mads6879 - very interesting proposal - thank you!

We're going to be looking at this issue in fine detail in this sprint (starting today) so you can expect the discussion to continue here shortly.

@jmcook1186
Copy link
Contributor Author

Noting that there is a related discussion started by a contributor in the hackathon ideas board
Green-Software-Foundation/hack#52

@jmcook1186
Copy link
Contributor Author

jmcook1186 commented Jan 24, 2024

Here's our preferred approach after a dev sync:

  • move the canonical units file to a separate repository that can be governed separately from IF
  • add it to npm and automate releases on PR merges so we can configure IF to always depend on the latest version
  • to make units file work with npm, make it a js or json file rather than a yaml
  • allow users to append new entries to the canonical units object at runtime (likely configure this in the manifest file, but could also be handled in the CLI) - i.e. all users get append-only access to the units file
  • allow users to provide their own units file, loading in place of the consensus file using npm link - this overrides our suggested canonical file
  • a new units file can only be used if --advanced-mode (or --unsafe-mode or similar) is toggled. This should be logged in the output file.

The benefits of this approach are:

  • we still provide a canonical set of units that most users will use as defaults
  • the canonical units have their own governance outside of IF but still integrate seamlessly into IF without any user action (because we always load the latest version)
  • users can develop plugins without raising PRs to the canonical units file, just by defining new units in the manifest file
  • users can contribute to the canonical set of units later if it is important to them
  • advanced users can override the entire units file and use their own, but they have to explicitly raise a CLI flag so that they tacitly acknowledge they are rejecting the canonical set ("removing the guardrails")
  • the implementation is relatively simple and can be shipped easily

@narekhovhannisyan @jawache

@jawache
Copy link
Contributor

jawache commented Jan 24, 2024

@mads6879 great thinking there; I think as we start exploring moving the units.yml (or whatever we end up calling it) to another repo, it would make sense to have a tool that enforces the rules you specify above, even if it was run as a linter to check that a proposed change doesn't break any of the existing parameters that all by itself would be useful in the new-repo version of the units.yml.

@jmcook1186 after some thinking over this, I believe, as well as what you proposed above, we need a naming convention. I've created an issue to discuss just the naming convention also: #410

@jmcook1186 jmcook1186 moved this from In Progress to Sprint Done in IF Feb 2, 2024
@jmcook1186 jmcook1186 moved this from Sprint Done to Done in IF Feb 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

4 participants