Framework improvements #52
Replies: 6 comments 18 replies
-
Hi @joachimvermeir - thanks for sharing this idea! There is a lot of valuable insight here that could make a great hackathon project - I'll get into this properly in a couple of hours and aim to give some more detailed feedback later today. |
Beta Was this translation helpful? Give feedback.
-
Some more detailed responses below - your text in bold, mine in italic. Overall onboard with the way you're thinking about this and excited to see what comes out of these ideas! The way the framework currently works indeed lacks a proper handling of units: - the term 'unit' is not really clear from the discussion in the Q&A and code base The unit describes how a given - (units of) input and output parameters are described nowhere they are/should be described in the docs for each model plugin, but it is true they are not defined in a manifest file. There is a file, - the resulting output data file of running the impact framework only contains the values of metrics, but not their units yes this is true - right now the output parameter is expected to exist in - when passing metrics between steps in the pipeline there's no way to validate or detect that the values are supplied in the right units yes this is true - the onus is on the user to ensure the right units are passed into each plugin. A plugin that receives some value will naively assume the value to be expressed in the correct unit. - Define 'units' properly: sure, but - Using the SI units where relevant would improve the readability and consistency (e.g. standard unit for time is 's', not 'seconds'). yes, fair point., We have tried to avoid any short-hand or abbreviations across the repository, preferring to be fully expressive wherever possible. Agree that the default unit should be the SI unit for a given parameter wherever possible. - Create documentation for the basic units that are part of the base impact framework and perhaps some guidelines on how to define unit names This would be helpful. Right now we are discussing a new approach to defining units that we intend to implement next sprint, most likely providing a way to append to or override the default - I also liked the idea of embedding the description and properties of the input/output parameters in a metadata file in each plugin. This would indeed allow to
I'd certainly be interested to explore this idea. Again, I think it will depend on the solution we implement for - perform unit conversions Agree, mostly covered in my response to the previous point. It would be cool if there was a way to auto-correct units based on the findings from the static analysis - e.g. if the unit for parameter - Include the metric's units in the output file so that it is clear which units are used for the different values No particular problem with this, but it is already captured in - Extend the model interface with all the input metrics and their units that are used up till that step and the output metrics and units, so it can be used by the plugins (e.g. a conversion plugin). We will probably be reticent to update the plugin interface as it is supposed to be as minimal as possible. That said, we are planning an update to the interface shortly, with the aim of reducing the amount of redundant data being copied into the output file and clarifying the difference between observations and configuration, which is currently somewhat ambiguous. We're open to discussing any convincing arguments about how else to change the signature, of course, but there really has to be an extremely strong reason to mess with it, as it will require rewriting all existing models. The particular functionality you suggest may well end up being rolled into our planned updates , so I'd say watch the discussion here and here to see if we cover it sufficiently. - Given the known metadata one could also imagine 'cleaning' intermediate data. For example, step 2 in the pipeline expects cpu-util as input to convert into something else. If none of the subsequent steps have cpu-util in their input, it is safe to remove that field from the data (memory/file) for any steps after step 2. yes, I like this idea. - If we would implement the 'units' as mentioned above and use plugins to perform conversions we might end up with a situation where a single model/plugin is used multiple times in a pipeline. If this plugin requires configuration that is different each time it appears in the pipeline, how can we configure it correctly (it would need some kind of step ID to allow it to be referenced) Interesting... I haven't really considered multiple invocations in a single pipeline. I suppose this could be especially relevant for plugins that do some manipulation or correction moreso than plugins that do calculations... I'll think on this a bit more, but gut reaction is that it might be worth exploring. - implementing a validation that a step (model/plugin) does not alter any output except what is described in its metadata, we won't be able to create generic plugins (like unit conversion) that operate on a (not explicityly defined, but at runtime calculated) subset of input fields. An option is to add a property to the plugin's metadata that it is generic (has no predefined inputs/outputs but does a selection at runtime based on the context/metadata) ok - agree in principle! - For some of the above points, they can be either implemented in the framework and/or in separate plugins or a combination. yes! |
Beta Was this translation helpful? Give feedback.
-
RE: Naming I do agree that units.yml is confusing. Perhaps it should be parameters.yml? We called it units.yml, but I suppose that wasn't the intention; we just named it poorly. The intention is to have a clear standard for parameter names. Since we want plugin authors to be able to trust that when there is "CPU-util," for instance, that is going to be expressed in "percentage" rather than "0-1," which is sometimes how it's expressed. These inconsistencies are where we had problems in the past and why we started creating the units.yml standard, E.g., carbon is expressed in gCO2e; one thing that's caught many people by surprise is that some tools and services will express carbon as lbsCO2e instead! As you can imagine, it has caused many problems, so we forced everything to use gCO2e instead. |
Beta Was this translation helpful? Give feedback.
-
RE: Unit Conversion Our thinking is for there to be a healthy ecosystem where writing plugins is simple then each parameter needs to be unique, and the units cannot change. Each plugin author should not worry about what units a parameter is expressed in. If you want to express the same parameter in a different unit, the parameter used needs to be different, so if you wanted to express carbon as lbs, you would have to export a parameter called carbon-lbs and not overload the carbon parameter name thus forcing every other plugin author to have to be wary regarding what carbon means. If that's the case, then there can be a plugin that performs conversions, but so as not to cause confusion downstream, those conversations would need to produce a different parameter. So there can be a conversion plugin that say converts To speak to your point on multiple of the same plugins in the same pipeline, I've thought of this before and the way you would do that right now would be to configure the same plugin multiple times in the initialization section. The name of the plugin that you use in a pipeline is just an id that refers to an instantiation of a plugin. I think it's nice for the plugin names to be descriptive of the task they are performing.
Then in your pipeline, you can perform conversions like so:
(FYI, as I wrote the above, I realized that sci-e, m, and o don't meet my statement that plugin names need to be descriptive!) |
Beta Was this translation helpful? Give feedback.
-
RE: Debugging That's an excellent point; something I've thought of before is a CLI param that exports each step in the pipeline as a separate file so we can see how the pipeline evolves the data. More generally, we need a better developer experience, which is a great step. What other improvements could we make to the developer experience? |
Beta Was this translation helpful? Give feedback.
-
Had a lot of similar ideas come up while ideation for the carbon hack. Posted relevant ones in the appropriate thread. I would be interested in circling back on some of these post-hack. Especially on the dimensional consistency of conversions. |
Beta Was this translation helpful? Give feedback.
-
Based on one of the Q&A prep session where the 'units' where discussed I have some ideas for improving the framework, probably they need to be split into different items but I'll share it here to start the discussion.
About 'units'
The way the framework currently works indeed lacks a proper handling of units:
Possible improvements:
Attention points
Other improvements
Beta Was this translation helpful? Give feedback.
All reactions