Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve IF logging protocols #600

Closed
5 tasks done
Tracked by #655 ...
jmcook1186 opened this issue Apr 10, 2024 · 21 comments · Fixed by #753 or #793
Closed
5 tasks done
Tracked by #655 ...

Improve IF logging protocols #600

jmcook1186 opened this issue Apr 10, 2024 · 21 comments · Fixed by #753 or #793
Assignees
Milestone

Comments

@jmcook1186
Copy link
Contributor

jmcook1186 commented Apr 10, 2024

Sub of #655

What
A set of updates to our logging and error reporting that will make it easier to diagnose and fix errors in IF runs.

Why

Diagnosing errors is one of the main elements of developer experience that we think needs to be improved to encourage IF adoption.

As a developer I want to be able to access rich debugging logs to help me to build with IF.
As a developer, I want to know exactly where int he stack my IF run failed so i can go in and debug.
As a user I want to have fine grained control over the type of logs I see in the console, and to be able to interpret them easily.
As a user, if my IF run fails, I want to know why and I want to be able to fix it quickly.

Context

To help users and developers identify where an error occurred, we should optionally print logs to the console during normal execution. Then, users can see how their IF run is progressing and use the logs to track when an error occurred. This should be configured using a --debug flag in the CLI.

If the debug flag is provided on the CLI, then logs should be emitted with the following structure:

[LOG LEVEL] [DATE] [MESSAGE]

e.g.

INFO: 2024-03-05T00:00:00.000Z: Starting Impact Framework

Supported logs

The following log messages should be added to IF execution:

file action log message Level
index.ts enter IF "Starting Impact framework" INFO
exit IF "Exiting IF" INFO
load.ts enter load function "Loading manifest" DEBUG
enter validatemanifest "Validating manifest" DEBUG
environment.ts enter injectEnvironment "Capturing runtime environment data" DEBUG
parameterize.ts enter Parameterize "Syncing parameters" DEBUG
enter getAggregationMethod "Checking aggregation method for ${unitName}" DEBUG
initialize.ts enter initialize() "Initializing plugins" DEBUG
enter initPlugin "Initializing ${plugin-name}" DEBUG
enter handModule "Loading ${plugin-name} from ${path}" DEBUG
compute.ts enter computeNode "Computing pipeline for ${node-name}" DEBUG
enter mergeDefaults "Merging defaults with input data" DEBUG
aggregate.ts enter aggregate "Aggregating outputs" DEBUG
enter aggregateNode "Aggregating node ${node-name}" DEBUG
exhaust.ts enter exhaust "Preparing output data" DEBUG
enter ExportYaml "Exporting to yaml file: ${savepath}" DEBUG
enter ExportCSV "Exporting to csv file: ${savepath}" DEBUG
enter ExporCSVRaw "Exporting raw csv file: ${savepath}" DEBUG

For plugins, we want to be able to integrate logs from plugins into the IF logger, but we also want to be able to develop and execute plugins independently of IF. It's important for the developer experience to be able to develop plugins quickly with minimal environment setup and dependencies. To balance these concerns, IF can easily be tweaked to reroute console.log() (or console.warn, console.error, console.info, console.debug) calls to the IF logger. This means we can prepend the name of the plugin emitting the message to the log and run everything through the framework's logger, but plugin developers can also just use console.X methods as usual in their development environment with no changes required to integrate into IF.

So, to improve plujgin logging we can:

  • reroute console.x to IF logger
  • add docs suggesting users log normal execution logs via console.debug
  • prepend log messages with the plugin name in our logger

A plugin (demo-plugin) that includes console.debug("my custom message") run through IF will be emitted as

DEBUG: 2024-03-05T00:00:00.000Z: demo-plugin: "my custom message"

SoW

  • Add --debug to CLI
  • Add log statements to IF and builtins that report execution logs when --debug flag is up
  • Add documentation about execution logs to if.greensoftware.foundation and plugin builders guide so that plugin builders know how to use our logger.

Acceptance Criteria

  • ie run with the --debug flag emits logs showing the execution details
    GIVEN the --debug flag is added to the CLI and logs are added to IF
    WHEN the following command is run: ie -m manifest.yml --stdout --debug
    THEN the following logs should be seen in the console:
INFO:  2024-03-05T00:00:00.000Z: Starting Impact Framework
INFO:  2024-03-05T00:00:00.050Z: Loading manifest
INFO:  2024-03-05T00:00:00.100Z: Validating manifest
INFO:  2024-03-05T00:00:00.200Z: Capturing runtime environment data
INFO:  2024-03-05T00:00:00.300Z: Syncing parameters
INFO:  2024-03-05T00:00:00.400Z: Checking aggregation method for carbon
INFO:  2024-03-05T00:00:00.500Z: Initializing plugins
INFO:  2024-03-05T00:00:00.600Z: Initializing sum
INFO:  2024-03-05T00:00:00.700Z: Loading sum from builtins
INFO:  2024-03-05T00:00:00.800Z: Computing pipeline for tree.children.child-1
INFO:  2024-03-05T00:00:00.090Z: Merging defaults with input data
INFO:  2024-03-05T00:00:00.100Z: <message emitted from plugin>
INFO:  2024-03-05T00:00:00.110Z: <message emitted from plugin>
INFO:  2024-03-05T00:00:00.120Z: Aggregating outputs
INFO:  2024-03-05T00:00:00.130Z: Aggregating node tree.children.child-1
INFO:  2024-03-05T00:00:00.140Z:: Preparing output data
INFO:  2024-03-05T00:00:00.150Z: Exporting to yaml file
INFO:  2024-03-05T00:00:00.160Z: Exiting IF
  • console.x calls are rerouted to the IF logger
    GIVEN the console.x calls from plugins are rerouted to the IF logger
    WHEN a plugin includes a call to console.debug("message") and IF is run with the --debug flag and a manifest that executes the plugin, e.g. `ie -m example.yml --stdout --debug
    THEN IF includes the plugin logs in its execution logs, reformatting it as follows:
DEBUG: 2024-03-05T00:00:00.000Z: {plugin name}: message

For example, a plugin called "my-plugin` that includes the following calls:

console.debug("executing my-plugin.execute")
console.debug("executing inputs[0]")
console.debug("executing inputs[1]")
console.log("exiting my-plugin")

full set of IF logs could looks as follows:

INFO:  2024-03-05T00:00:00.000Z: Starting Impact Framework
INFO:  2024-03-05T00:00:00.050Z: Loading manifest
INFO:  2024-03-05T00:00:00.100Z: Validating manifest
INFO:  2024-03-05T00:00:00.200Z: Capturing runtime environment data
INFO:  2024-03-05T00:00:00.300Z: Syncing parameters
INFO:  2024-03-05T00:00:00.400Z: Checking aggregation method for carbon
INFO:  2024-03-05T00:00:00.500Z: Initializing plugins
INFO:  2024-03-05T00:00:00.600Z: Initializing my-plugin
INFO:  2024-03-05T00:00:00.700Z: Loading my-plugin from ./my-plugin
INFO:  2024-03-05T00:00:00.800Z: Computing pipeline for tree.children.child-1
INFO:  2024-03-05T00:00:00.090Z: Merging defaults with input data
INFO:  2024-03-05T00:00:00.100Z: my-plugin: executing my-plugin.execute
INFO:  2024-03-05T00:00:00.101Z: my-plugin: executing inputs[0]
INFO:  2024-03-05T00:00:00.110Z: my-plugin: executing inputs[1]
INFO:  2024-03-05T00:00:00.111Z: my-plugin: exiting my-plugin
INFO:  2024-03-05T00:00:00.120Z: Aggregating outputs
INFO:  2024-03-05T00:00:00.130Z: Aggregating node tree.children.child-1
INFO:  2024-03-05T00:00:00.140Z:: Preparing output data
INFO:  2024-03-05T00:00:00.150Z: Exporting to yaml file
INFO:  2024-03-05T00:00:00.160Z: Exiting IF
@jmcook1186 jmcook1186 added this to IF Apr 10, 2024
@zanete zanete moved this to Backlog in IF Apr 22, 2024
@zanete zanete moved this from Backlog to In Design in IF Apr 22, 2024
@jawache jawache changed the title Refine logging protocol Improve logging to help with debugging and diagnosing issues with running manifest files Apr 22, 2024
@jawache
Copy link
Contributor

jawache commented Apr 22, 2024

@jmcook1186 and @narekhovhannisyan, so my overall feedback as a user of IF is that there are several issues we have with not just logging but generally diagnosing issues.

Unless your manifest is trivial, it's impossible to determine which part of the manifest file is causing the error.

Typically, the flow is that you see a seemingly random error appear in the console, start commenting out bits of the manifest file, and re-run until you narrow down the part of the manifest file that is causing the error.

  • It usually starts with commenting out pipeline entries until you narrow down which plugin causes the error.
  • Then you start with commenting out/changing the config/inputs etc.. until you figure out the nature of the error.
  • The errors are often very cryptic, it's hard to know where they really come from.
  • At this point, I'm usually lost, and I just post in the Slack room and get one of you to help figure this out.

I have the advantage of the last point, and I can rely on you, but clearly, most people do not.

So we need a few solutions.

Research and document the current state of affairs

  • I think we need to have a rich documented library of exceptions with clear meanings.
  • We discussed this last year, and I can see Keethan documented the exceptions here Models return useful exceptions #180, and some work was done here Errors layer if-unofficial-plugins#6. But I don't see any documentation, so it's hard to see what's even available.
  • If we do have a library of exception types, then for new plugin developers, if we are not documenting them somewhere, then how can we expect them to throw the things they should be throwing?

After we've done that analysis, what is missing?

A first step to all of this, I would say, is documenting what exceptions our plugins and platform throw right now under what conditions and figuring out what exceptions are missing (we've casually discussed several during weekly IF calls but it's unclear to me which we've implemented).

This issue, I believe, covers both of the above (#337).

Giving context to the end user

The name of the exception and the plugin that raised it is not enough information to be able to diagnose the problem. As an end user, I have to do much more work to be able to figure out the root cause. A little bit more context would save a huge amount of effort and make the UX much more pleasant.

  • The name of the pipeline entry (if the exception came from a plugin)
    • We need to know that the error was caused by, say, the "normalised-weights" pipeline entry instead of the Multiply plugin. I can sometimes guess if the pipeline uses Multiply once, but what if you use it twice?
    • It's also a lot easier for the end user if you tell them the name of the pipeline entry instead of the function name of the plugin.
    • Without this context I will have to figure it out by commenting out pipeline entries and re-running.
  • Which node in the tree triggered this error?
    • We've discussed this a few times over the last year. I believe this is a hard problem to solve, but it's something we can't ignore. This is a must for this ticket.
    • It's easy to figure out trivial manifests. However, for more complex manifests, with many repeated plugins and pipelines, it's close to impossible without commenting out parts of the tree and re-running.
    • I think one reasonable approach is to give the user a path like tree.children.vm1.children.XXX or something like that so they know where in the tree the error was caused so they can easily find it in VS code.
    • If there is an official way to quickly search for a node in VS code or another tool using some format to define a YAML path, we should explore that also.
  • Component state
    • What was being passed to the plugin?
    • It would be useful to see the component, and it's current set of inputs/outputs (i.e. the state of the component after all the previous plugins have been run, defaults applied etc...)
    • Something like that printed to the console would have saved me many hours of debugging. It's easy to figure out if your defaults are getting overridden or if an earlier plugin is not outputting what you expect.

NOTE: Ideally I'd like the context to be printed alongside the critical exception, but if that is very challenging one approach might be to just provide context with DEBUG level logging. So the above context is always printed to the DEBUG level. If you get an exception, you re-run with DEBUG level on, it prints an enormous amount into the console and you can scroll up from the exception to see the context like where in the tree or the name of the plugin. We have that as a fallback and it seems reasonable.

@zanete
Copy link

zanete commented Apr 23, 2024

Realised this is a blocker for #592 and #638, and consequently #615 -

@jmcook1186
Copy link
Contributor Author

Hi @narekhovhannisyan - here are some of the key items we need to think, experiment and have async chats about in relation to logging, in order to get this ticket moving, in my opinion.

  • we need to distinguish INFO, WARN and ERROR, ideally colour-coded in the console (this is done in your poc, iiuc)
  • we need to know where the issue originates - which entry in a pipeline emitted an exception/log? This has to be the specific instance of the plugin, not just the exported function name as there can be many instances of a given plugin in a pipeline.
  • which node in the tree surfaced an error?
  • can we surface component state information including input values passed into a plugin?
  • do we want to enable the user to control the granularity of the log messages they see?
  • should we implement some user control on where the logs output to - console, file?

@jawache
Copy link
Contributor

jawache commented Apr 23, 2024

@jmcook1186 another item I realised I forgot to mention is giving the plugin developers the ability to log also, they can provide context about what's going on internally that is really useful for diagnosis.

The simplest solution as I see it doesn't require us to build any complex functionality, we just intentionally print certain things to a debug log.

So at the point you know what node your working on, print to the debug log

"Processing node x.y.z"

At the point you know what the state of the component is (i.e. what you are passing to the plugin) print to the debug log

"Component state: ....."

If it's all printed to the log then, on an error , to get more context you can simply rerun your manifest with a -debug flag and it prints everything, you simply scroll up from the error to find the additional context like current state, node path etc...

The pressure from experience is that developers want to print everything to "info" and the console becomes very busy, so some guidance for what gets printed to critical, warn, info, debug is also required (this guidance should be public and what we recommend for plugin developers also).

A more advanced solution might be to store some state internally so when a critical exception is thrown, logging at critical the exception plus the path and the component state. If we can do this then the user might not need to rerun the manifest file with the debug flag, but that really is a nice to have as it just saves the user from rerunning to see context which isn't such a big issue.

@jmcook1186
Copy link
Contributor Author

Noting that I added this page to the documentation where all the current error classes are listed: https://if.greensoftware.foundation/reference/errors

@jmcook1186
Copy link
Contributor Author

Sketching out some thoughts on this topic before working on the ticket description (@jawache for info).

Here's three aims for this task:

Add context to error reports:

Extend our error builder so that it also surfaces:

  • the name of the plugin or IF feature where the error originated
  • the plugin/framework state when the error occurred (possibly behind a --DEBUG flag)
  • the specific pipeline in the tree that was being executed when the error occurred

The new error format could look similar to:

 WriteFileError: file dummy.csv: permission denied
 Error raised by: csv-export
 Occurred at:
	 tree.children.child1
 Global config received: none
 Node config received: 
     savepath: dummy.csv
 Inputs received:
	 [{- timestamp: 2023-08-06T00:00
	     duration: 3600
	     carbon: 30},
	 {- timestamp: 2023-08-06T00:00
	     duration: 3600
	     carbon: 30},
	 ]

Standardize and document error classes

We want to define a finite list of error classes that can be emitted and definitions for when these errors should be raised, to reduce the ambiguity of error messages from plugins.

Here's the list of error classes we currently use across our IF features and plugins

We should check this list and identify any missing classes or classes we can remove.

Then we need to determine a strategy for how this list of error classes should be distributed to plugin builders (is documenting on IF site enough?)

Add detailed logs (behind --verbose or --debug flag

We should optionally print log statements to the console during normal execution so users can see how their IF run is progressing and use the logs to track when an error occurred.

If the debug flag is provided on the CLI, then logs should be emitted that look something like the following:

INFO: [time]: IF: starting IF
INFO: [time]: IF: loading manifest
INFO: [time]: IF: initializing plugins
INFO: [time]: IF: start tree.children.child pipeline execution
INFO: [time]: IF: executing sum plugin
INFO: [time]: Sum plugin: loading global config
INFO: [time]: Sum plugin: loading node level config
INFO: [time]: Sum plugin: loading input data
INFO: [time]: Sum plugin: processing inputs element[0]
INFO: [time]: Sum plugin: processing inputs element[2]
INFO: [time]: Sum plugin: processing inputs element[3]
INFO: [time]: Sum plugin: returning output data
INFO: [time]: IF: executing multiply plugin

...

INFO: [time]: IF: writing output file
INFO: [time]: IF run complete

@jmcook1186 jmcook1186 changed the title Improve logging to help with debugging and diagnosing issues with running manifest files Improve IF error reporting and logging protocols May 13, 2024
@narekhovhannisyan
Copy link
Member

@jmcook1186 We can extend the current implementation of the plugin interface to support flags, however, it won't force users to rely on that flag. Maybe we can do some trick to tune that behavior from IF. F.ex. temporary override console with another stub function, and put the original one back after plugin execution. I need to dig in to see what workarounds can be there.

@narekhovhannisyan
Copy link
Member

@jmcook1186 it won't help in case if users will create custom plugins without our template. We can implement condition in our IF error handler, to support certain errors, and If error is not from the supported list, then error out or just log that error without full support.

@manushak manushak moved this from Ready to In Progress in IF May 29, 2024
@zanete
Copy link

zanete commented May 30, 2024

expecting to raise a PR tomorrow morning 🙏

@manushak manushak moved this from In Progress to Pending Review in IF May 31, 2024
@manushak manushak linked a pull request May 31, 2024 that will close this issue
9 tasks
@zanete
Copy link

zanete commented Jun 3, 2024

@narekhovhannisyan please review the PR changes 🙏

@zanete zanete moved this from Pending Review to Testing in IF Jun 4, 2024
@zanete zanete moved this from Testing to In Progress in IF Jun 4, 2024
@manushak manushak moved this from In Progress to Ready in IF Jun 5, 2024
@manushak manushak moved this from Ready to Pending Review in IF Jun 5, 2024
@zanete zanete added this to the Improve Trust milestone Jun 5, 2024
@zanete zanete moved this from Pending Review to Testing in IF Jun 5, 2024
@zanete zanete moved this from Testing to In Progress in IF Jun 10, 2024
@narekhovhannisyan narekhovhannisyan linked a pull request Jun 10, 2024 that will close this issue
9 tasks
@manushak manushak moved this from In Progress to Pending Review in IF Jun 11, 2024
@zanete zanete moved this from Pending Review to Testing in IF Jun 11, 2024
@github-project-automation github-project-automation bot moved this from Testing to Done in IF Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
6 participants