Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

default logging configuration doesn't log other packages than kedro #1732

Closed
Tracked by #1735
Debbby57 opened this issue Jul 28, 2022 · 12 comments
Closed
Tracked by #1735

default logging configuration doesn't log other packages than kedro #1732

Debbby57 opened this issue Jul 28, 2022 · 12 comments

Comments

@Debbby57
Copy link

Description

since the 0.18.2 release, I lost the logs for the other packages than kedro in my projects, if I don't specify them in the logging.yaml
I tested with multiple packages like custom kedro plugins like kedro-mlflow (@Galileo-Galilei) and other packages like paramiko.
If I want to have logs for thoses packages I need to update logging.yml

loggers:
    kedro:
        level: INFO
    <my_package>:
        level: INFO
    kedro-mlflow:
        level: INFO
    paramiko:
        level: INFO

I think it's a bug because for me it's difficult to ask users to add all their packages in the logging.yaml. I think the default behavior we want is to have all logs by default and use logging.yaml to remove logs we don't want.

Context

I need all my logs by default to control the correct behavior of my projects. And if I don't want some of them I remove them

Steps to Reproduce

  1. install kedro 0.18.2
  2. install kedro plugin like kedro-mlflow or another package like paramiko
  3. run your project. You will see that you don't have kedro-mlflow logs or paramiko's logs
  4. update your logging.yaml like above and run your project. You will have logs

Expected Result

By default, we should have all logs.

Actual Result

We just have kedro logs.

Your Environment

Include as many relevant details about the environment in which you experienced the bug:

  • Kedro version used (pip show kedro or kedro -V): 0.18.2
  • Python version used (python -V): 3.9.13
  • Operating system and version: linux

Thank you for your help :)

@Galileo-Galilei
Copy link
Member

Galileo-Galilei commented Jul 28, 2022

This is a quite annoying issue I am facing too : All tests of this PR on kedro-mlflow were working before the release and are failing since kedro==0.18.2. The logs shows that the test fail because kedro-mlflow logging is no longer displayed in the console.

The fix in logging.yml works, but like @Debbby57 I don't feel it is the right default behaviour (and it is a breaking change compared to 0.18.1, so I think this is not intentional).

@antonymilne
Copy link
Contributor

antonymilne commented Jul 28, 2022

Hello @Debbby57 and @Galileo-Galilei, and thanks as ever for raising the issue here! I'm responsible for this change so let me try to explain a bit.

This was indeed a change made in 0.18.1:

The root logger is now set to the Python default level of WARNING rather than INFO. Kedro's logger is still set to emit INFO level messages.

So this change was indeed intentional, and the reason this was designated a bug fix/minor change rather than a breaking change was because (a) the previous behaviour on Kedro was arguably incorrect and (b) I didn't anticipate this change affecting anyone (indeed you're the first people to even notice it). In hindsight probably this should have been noted under "Minor breaking changes to the API" instead though. So sorry about this and the awkwardness it has caused you!

First note that the default logging level in Python is set to WARNING. Prior to 0.18.1, the logging config went like this:

loggers:
    anyconfig:
        level: WARNING
        handlers: [console]
        propagate: no

root:
    level: INFO
    handlers: [console, info_file_handler, error_file_handler]

i.e. we changed the root logging level to INFO. This results in a lot of logs being emitted by anyconfig, hence the manual setting of that level to WARNING. There's been other places where people have needed to manually alter the logging config to reduce the verbosity of logs from other packages, e.g.py4j.java_gateway on databricks.

Why did I not like this? Because it's heavy-handed and disruptive since it alters the behaviour of the root logger from the expected Python default of WARNING. Hence any other libraries outside kedro that perform logging will be affected unexpectedly (which means putting in place special cases to handle anyconfig etc.).

So in 0.18.1 this was changed to:

loggers:
  kedro:
    level: INFO

root:
  handlers: [rich]

i.e. we leave the root logger level alone and just make kedro the special case. This feels much cleaner because it's only altering the logging configuration for the package that we're responsible for and not interfering with any others. Hence there's no need for the anyconfig etc. special cases any more either.

Note this actually immediately causes a new problem for us, since any INFO-level logs emitted by a project would no longer be shown. I regarded this as unexpected/undesired behaviour and a breaking change, which is why the project logging.yml config also sets {{ cookiecutter.python_package }}: level: INFO to ensure those messages are still shown as they were before 0.18.1. As above, I considered the behaviour of any packages outside kedro and the current project to be "outside" our responsibility.

Unfortunately packages outside our responsibility also include plugins such as kedro-mlflow. I hadn't really considered this specifically before and do see why it would be a problem, sorry! So what can be done about this?

  1. you add kedro-mlflow: level: INFO to your project logging.yml file as @Debbby57 suggested above. IMO this is the "correct" solution in a pure sense, but I do appreciate it's a bit annoying
  2. you alter the level of the root logger in your project logging.yml file to INFO. This is a bit easier since you don't need to change the config for every additional new package but might mean you need to silence verbose packages like anyconfig
  3. use the kedro namespace for your logging. Through logging propagation this would then output at level INFO. This would mean, for example, kedro-mlflow emitting logs using the namespace kedro.mlflow. This is maybe a little bit ugly/confusing (and wouldn't work for paramiko etc.) but should mean that users don't need to edit their logging config
  4. packages that want to emit logs at level INFO set that in their logging config, e.g. logging.setLevel(). Since we leave disable_existing_loggers: False then I think this should work but might take a bit of fiddling round (in my experience the propagation/handling of Python loggers does not work quite as you'd expect. Possibly we need to set root logger level explicitly to NOTSET or something). If we can get this working then there would be no need for a user to alter their logging configuration though
  5. we add special cases inside kedro to output messages at INFO level for kedro-mlflow, paramiko, etc. This doesn't feel quite right to me though. It seems better than the special case we used to have for anyconfig (since it doesn't alter the Python default root logging level) but still not ideal
  6. we revert the behaviour to make the root logger level INFO like it used to be. But, as explained above, this change was made for some good reasons, and so I would be reluctant to do this unless there's a compelling argument to do so

Let me know what you think! My feeling is that the right solution here would be 4 for plugins like kedro-mlflow, and 1 or 2 for other packages like paramiko. I am still willing to be convinced that option 5 or 6 is the right solution here, but currently they don't feel right to me. Maybe there's another solution I didn't think of, so if you have any ideas then please do say. I hope this helps to explain the change and the balance of factors that need to be considered here anyway.

@Debbby57
Copy link
Author

thank you for your reply.
I need to think about this but for now :

  • I still don't like option 1, because as a user I don't want to search and list packages for which I want infos logs. I could be very time-consuming.
  • short term, I'm going to use option 2
  • I don't like option 3 but I'm just a user of kedro-plugins so I think my opinion is really facultative ;)
  • Option 4 seems to be unreachable for packages which are not kedro-plugins
  • I agree that option 5 is not a good option.
  • I understand why you don't want the option 6.

@antonymilne
Copy link
Contributor

antonymilne commented Jul 28, 2022

Just to clarify, if option 4 works then it should be possible for all packages, not just kedro-plugins. Indeed I think that any packages that work outside kedro and wish to emit INFO level log messages must already be doing something like this, otherwise, independently of kedro, their log messages would not appear (since it would be filtered out by the Python default WARNING level).

Let me check this actually, since the more I think about it the more this feels like it would be the right approach, and if kedro doesn't currently work with external packages like this then it's a problem. We shouldn't be losing log messages for packages which do explicitly setting their levels to INFO.

@Debbby57 just to understand a bit better, please can you explain exactly what are the packages for which you want INFO logs and why? I think it's probably not very common to have a long list of these. Given that the Python default level is WARNING, most Python packages expect you to just leave unaltered and don't expect users would want to see the INFO level messages at all.

@antonymilne
Copy link
Contributor

antonymilne commented Jul 30, 2022

I've just done some testing of this and believe that option 4 should indeed work. i.e. if you have a package that uses a logger set to level INFO log messages then those messages will still be shown during kedro run without any changes to logging configuration required.

This seems like the correct solution because each package then has responsibility to choose the appropriate log level for itself, and such packages will work as expected both inside and outside kedro.

In practice, what does this mean?

For package authors (package could be kedro plugin, e.g. kedro-mlflow, or a general one, e.g. paramiko)

If you want the level of logging to be different from the Python default of WARNING then you should should set it as such in your package or expect users to manually set the level for your package by editing their logging.yml.

To set the logging level to INFO in your package, you would do logger.setLevel(logging.INFO). Note that due to logging propagation, you don't need to set this separately for every single module that does logger.info. You can just set, e.g. logger.getLogger("kedro-mlflow").setLevel(logging.INFO) and it will work for any loggers below that in the hierarchy (e.g. kedro-mlflow.something.blah.module).

The only difference between a kedro plugin and a general package here is that a general package could be used outside kedro. This means that, unlike a kedro plugin, the logging level set by the package wouldn't be chosen specifically to work nicely with kedro.

For kedro users

If you wish to change the logging level of packages from the ones set by the packages themselves then you need to edit your logging.yml to do so according to option 1 or 2 above. If you think the package itself has set the wrong logging level then you should contact its authors to ask if they would change it, but as in the previous section, probably this would only make sense in the case of kedro plugins which are never used outside kedro.

Overall

What I've written here is actually how I'd always envisaged this working but I don't think I'd written it out explicitly anywhere, so it's good to have written down now. IMO this means the current kedro behaviour is the correct one, since it places responsibility for altering logging levels in the hands of the package authors and users. All that kedro should be responsible for is setting the logging level for kedro and your project package.

Let me know if this works for you. @Galileo-Galilei hopefully just setting the log level in mlflow will fix your unit tests.

@antonymilne
Copy link
Contributor

I'm going to close this issue because I think there's nothing that needs to change on the Kedro side, but please do tag me in a comment and say if you disagree and I'll be happy to reopen!

@Galileo-Galilei
Copy link
Member

Galileo-Galilei commented Aug 24, 2022

Hi @AntonyMilneQB, thank you for the reply. What you say makes sense, but it would be nice if kedro plugins were set to "the same logging level" as kedro (e.g. INFO by default), but there are a few edge cases to deal with (e.g. if a user specifically specifies two different logging level in the logging.yml, it should have thre priority).

Hardcoding the logging INFO level inside the package is not user friendly, but I guess I can use after_context_created hook to fetch the logging.yml and eventually change the default if nothing is provided for kedro-mlflow. This seems a but overkill, but easy to implement and consistent with the rest of kedro.

@antonymilne
Copy link
Contributor

antonymilne commented Aug 24, 2022

HI @Galileo-Galilei, it would indeed make sense to set kedro plugins to the same logging level as kedro (so INFO by default) but unfortunately I don't see a good way to do that without us adding every single one of kedro-mlflow, kedro-neptune, kedro-ge, etc. to logging.yml explicitly. AFAIK there's no way to do some wildcard match like kedro-*. In theory we could add them as and when plugin authors request it I guess (so just kedro-mlflow for now) but it still doesn't feel right to me.

Alternatively we could encourage plugins to log to kedro.plugins.plugin_name, so it would be kedro.plugins.mlflow for example or even just kedro.mlflow rather than kedro-mlflow. This means that the logging level would be set to the same as that of kedro without needing to specify anything special in logging.yml, since the logging would then be under the kedro namespace. There's nothing stopping you from doing this now in fact - you just need to name your logger appropriately.

With that said, I don't quite understand the problem with specifying the logging level on the package side:

Hardcoding the package info level is not user friendly, but I guess I can use after_context_created hook to fetch the logging.yml and eventually change the default if nothing is provided for kedro-mlflow. This seems a but overkill, but easy to implement and consistent with the rest of kedro.

This feels like overkill indeed, but maybe I'm missing something. What is wrong with setting logger.getLogger("kedro-mlflow").setLevel(logging.INFO) within kedro-mlflow? This doesn't need to use a hook or look in logging.yml at all.

@antonymilne antonymilne reopened this Aug 24, 2022
@Galileo-Galilei
Copy link
Member

AFAIK there's no way to do some wildcard match like kedro-*

I don't know either, but since they are loaded through an entrypoint, there is likely a way to get the names of plugins dynamically (e.g. "any package with given entrypoints"). It does not seems trivial, and likely not a high priority.

Alternatively we could encourage plugins to log to kedro.plugins.plugin_name, so it would be kedro.plugins.mlflow for example or even just kedro.mlflow rather than kedro-mlflow

I really like the idea, but does it require having kedro_plugins installed or logging will just inherit from kedro logger, no matter if it is not an actual package?

What is wrong with setting logger.getLogger("kedro-mlflow").setLevel(logging.INFO) within kedro-mlflow?

Not sure I am not very familiar with logging but:

  • what will happen if the user set a non default level in its logging.yml? Will it take priority or not (this is desirable) ?
# logging.yml
loggers:
  kedro_mlflow:
    level: WARNING # <- anything other than INFO?
  • where do you think I should set this logging level definition ? In the package root __init__.py so it is triggered at import time?

@antonymilne
Copy link
Contributor

Hi @Galileo-Galilei, I finally got a chance to try out some code here, did a pip install kedro-mlflow and played around with my logging.yml file. I was going to test out my proposed solution and was very, very confused by the behaviour until I checked your repo and saw that you already implemented this 😀

I think your solution setting the logging level in the package root __init__.py is absolutely the right place to put it. I've tested and this does still allow me to customise the kedro_mlflow logging level in my project logging.yml if I want to, which I think is correct.

Are you happy with this solution? It feels correct to me 🙂

@Galileo-Galilei
Copy link
Member

Galileo-Galilei commented Sep 9, 2022

Thank you very much for taking time to test it!
I am completely fine with the solution, and I think the current logging behaviour is indeed the correct one. We can likely close the issue.

@antonymilne
Copy link
Contributor

Perfect, thank you for confirming! I think we're all good here then and I will close the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants