Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make rich markup compatible with file-based logs #2207

Closed
Tracked by #2205
antonymilne opened this issue Jan 16, 2023 · 9 comments
Closed
Tracked by #2205

Make rich markup compatible with file-based logs #2207

antonymilne opened this issue Jan 16, 2023 · 9 comments
Labels
Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation

Comments

@antonymilne
Copy link
Contributor

antonymilne commented Jan 16, 2023

#1663 added to develop some rich markup tags like [dark_orange] to some key kedro log messages. This works great in all terminals (rich strips out the tags for dumb terminals that can't render the styles) but unfortunately not with kedro's file logging which will save logs like this:

Loading data from [dark_orange]y_pred[/dark_orange] (MemoryDataSet)...

There's also a much less important inconsistency is that the format of messages logged to files are no longer consistent with the console output. e.g. in console you get

[02/06/23 12:04:38] INFO     Kedro project iris                     session.py:340

vs. in the log file:

2023-02-02 15:53:57,097 - kedro.framework.session.session - INFO - Kedro project iris

There's various options here:

  1. Just leave it as it is now because it's not such a big deal
  2. Remove file-based logging from the default template (see Drastically simplify the default project template #2149 (comment) for more discussion on this) - but doesn't actually fix the issue: if someone wants to add in a file logging handler, what do they do? Just use tee rather than Python logging?
  3. Revert Further richify logs #1663 (but colours were carefully selected before by @AhdraMeraliQB be accessible to various forms of colour blindness, unlike the default behaviour of ')
  4. Write our own custom RotatingFileHandler that strips out rich markup before emitting logs (see below for possible ways to do this)

Note. We will already have a new logging handler in kedro from #2277, so writing another one here is not so outlandish. But it will be trickier and less important than in #2277, so I'm not so sure it's worth it here.

@antonymilne antonymilne changed the title Make rich tags compatible with file-based logs Make rich markup compatible with file-based logs Jan 16, 2023
@antonymilne antonymilne added the Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation label Jan 16, 2023
@antonymilne
Copy link
Contributor Author

Custom RotatingFileHandler

There's a few ways to do this depending on what we want to prioritise. We need to think about exactly what input arguments) such a class would take because they expect different things 😬 Probably option 2 is simplest and easiest but not very elegant.

1. Inherit from RichHandler

This supplies rich's Console method with the appropriate file supplied by the RotatingFileHandler. It will resolve the small inconsistency but I'm not sure how robust it is.

class RotatingFileHandler(RichHandler):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        # How to take in rotating file handler arguments here?
        self.f = logging.RotatingFileHandler(filename="a.txt")

    def emit(self, record: logging.LogRecord) -> None:
        self.console = Console(file=self.f.stream)
        super().emit(record)

2. Inherit from RotatingFileHandler

This strips out the markup tags from the log message record. We could do some sort of a regex but probably it's easier/safer to just match exactly which tags we're using. This means that we need to keep MARKUP_TAGS updated if we add any new markup though. This method will leave the small inconsistency.

class RotatingFileHandler(logging.RotatingFileHandler):
    MARKUP_TAGS = {"dark_orange", ...}

    def emit(self, record: logging.LogRecord) -> None:
        for tag in self.MARKUP_TAGS:
            record.msg = record.msg.replace(f"[{tag}]", "")
            record.msg = record.msg.replace(f"[/{tag}]", "")
        # Alternative could put record.msg through a Console output 
        # to strip markup (is there a simpler way to do this? Ask on rich repo)
        # Should save the console instance in init rather than recreating every time
        # as emitting log message should be fast.
        # console = Console(file=StringIO())
        # console.print(record.msg)
        # record.msg = console.file.getvalue()
        super().emit(record)

@antonymilne
Copy link
Contributor Author

antonymilne commented Feb 22, 2023

Following discussion in tech design on 22 February, we decided that it would be fine to go for option 1 (just leave it as it is and see if anyone complains about it). But given the discussion there, I think there's actually a better solution now:

Revert #1663 and raise an issue/PR on rich to make colours that are used by rich for e.g. ' and callables more accessible. Look at #1514 for a reminder of how the current styling works (e.g. repr.str is an important one for us).

I think it's quite likely that such a change in rich itself wouldn't be feasible because we're only really interested in changing a few colours that we use in logging, so would we want to change just those few like repr.str that we personally care about or everywhere in the rich styling from green to dark_orange? The former would probably be a bit inconsistent, while the latter might be a more drastic change than rich wants to make.

So, assuming we can't change this on rich itself what we should do is just change the colours in our custom handler. This will be possible after #2277. We'd need some code like this:

from rich.theme import Theme
from rich import get_console
console = get_console()(theme=Theme({"repr.str": "dark_orange"}))

and then pass console into the RichHandler.

This way we get the benefit of the accessibility but also file logging won't have weird tags in it. The only downside is that anything in ' will look the same (so we can't style dataset names vs. node names differently for example) but overall I think this is a good compromise.

@antonymilne
Copy link
Contributor Author

@AhdraMeraliQB what do you think of the above? 🙂

@AhdraMeraliQB
Copy link
Contributor

The accessibility choices in #1663 primarily considered being able to differentiating Kedro's core elements/types from other quoted strings in log messages - specifically datasets and paths (nodes were handled by removing the quote marks entirely). This means the nodes, datasets, paths, numbers and other quoted strings should all be easy to tell apart at a glance. The default rich colour choices are still generally discernable from each other (string vs number etc). I do not see much added value in specifying dark orange over green for all strings.

(This is also why raising a PR on Rich would likely be unhelpful - the elements where we consider accessibility are Kedro-specific)

If we are to revert #1663, I don't think the second step would be necessary.

@antonymilne
Copy link
Contributor Author

This makes sense, thank you for the explanation @AhdraMeraliQB. In this case I propose that we leave it as it is now and see if anyone complains about it. I think it will be increasingly an edge case anyway following outcomes of #2149 which is likely to diminish importance of logging to files. And then if people do get upset about it we can revisit.

@Zangetsu-GPO
Copy link

Common guys, please remove those tags from the console logger... not a big deal ok but still quite annoying. Especially for customers. I'm working on migrating our models to kedro and I feel a bit sad to tell Data Scientists that there is bug in logs.. for a product we try to promote...

@noklam noklam reopened this Jul 2, 2024
@github-project-automation github-project-automation bot moved this from Done to In Progress in Kedro Framework Jul 2, 2024
@noklam
Copy link
Contributor

noklam commented Jul 2, 2024

@Zangetsu-GPO Thanks for the feedback, actually this is in the pipeline already. #3682, this will be fixed soon.

@datajoely
Copy link
Contributor

@Zangetsu-GPO do any of these environment variables work for you today?
https://rich.readthedocs.io/en/stable/console.html#environment-variables

@astrojuanlu
Copy link
Member

#3682 was merged, closing.

@github-project-automation github-project-automation bot moved this from In Progress to Done in Kedro Framework Jul 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation
Projects
Archived in project
Development

No branches or pull requests

6 participants