Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Less noisy console output #20154

Closed
davidgilbertson opened this issue Nov 10, 2022 · 20 comments
Closed

Less noisy console output #20154

davidgilbertson opened this issue Nov 10, 2022 · 20 comments

Comments

@davidgilbertson
Copy link

Feature request

I have just started using HF transformers and am struck by the amount of text it dumps into the console.

Just following the steps in the course, a simple script that loads a model and trains it spews out this:
image

...my monitor isn't big enough to screenshot it all :)

Note that I'm using PyCharm, and specifically the built in "Python Console".

Motivation

This is a problem because now a user has two options:

  • every single time they run their code, scan the wall of text to see if there is any new information that they haven't seen 300 times before
  • not re-read the wall of text every time, and potentially miss out on useful information that they need to attend to.

Also, it's just not very pretty, and pretty is nice.

Your contribution

Only suggestions/questions:

  • Do any HF developers use PyCharm's Python Console? Maybe it's worth testing on this, it's flawed, but quite popular.
  • You can check whether the environment is a TTY with sys.stdout.isatty(). tqdm simply doesn't work well when not in a terminal (beyond a simple indicator, as long as you don't print anything while the indicator is active). So a good solution is to simply print the results at the end for these environments.
  • I don't think writing cache files is something to notify the user about. Perhaps it's worth thinking in terms of 'user personas': to the first-time user, this is useful information. For every other run, once the user knows that HF writes checkpoints in a certain place, it's no longer information that needs to be logged and so goes from helpful to detrimental, since it makes important info harder to spot. Maybe the problem is just that HF it setting the log level to "INFO" when the default Python level is "WARNING" and all you need to do is pick up the correct log level from the user's environment and most of the junk will disappear.
  • I don't know why it's all red, this also adds to the difficulty in seeing real errors (and also adds to the ugliness).

I hope this is useful and not just me complaining...

@sgugger
Copy link
Collaborator

sgugger commented Nov 10, 2022

You can adjust the logging level to your preferred value with transformers.utils.logging.set_verbosity(log_level).

Also cc @LysandreJik

@davidgilbertson
Copy link
Author

Thanks @sgugger, I just tried that.

There's still plenty of noise from code in HF (datasets) like this:

logger.warning(f"Loading cached processed dataset at {cache_file_name}")

Also, Trainer will call args.get_process_log_level() and overwrite whatever I've set with logging.set_verbosity(). I think there might be something wrong in get_process_log_level (after a quick glance). The docstring says that the default log level is passive and that this won't change anything, but this line in the code explicitly sets the log level to INFO. Should that not default to logger.getEffectiveLevel()?

@sgugger
Copy link
Collaborator

sgugger commented Nov 10, 2022

You'll need to do the same thing for datasets (same API I believe 🤞 ). As for the Trainer it's very possible that there is a bug. Do you want to suggest a fix in a PR?

@davidgilbertson
Copy link
Author

I've got a full schedule with study at the moment, sorry.

So in summary this looks like 2.5 issues:

  • messages like "loaded from cache" should be log level INFO, not WARNING
  • the default log level should read from the user's environment, or at least use the same default as Python, which is 30/WARNING, not 20/INFO
  • And a nice to have would be that all HF packages shared a log-level setting, although if the defaults are right this is not a big deal.

@sgugger
Copy link
Collaborator

sgugger commented Nov 14, 2022

I disagree with the first two, and even if I did, it's too late to change it without surprising the whole user community. The only issue I see is the bug in Trainer you reported :-)

@davidgilbertson
Copy link
Author

Ah, interesting, I thought the first one was quite clear cut. From the Python logging how-to:

  • INFO: Confirmation that things are working as expected.
  • WARNING: An indication that something unexpected happened, or indicative of some problem in the near future (e.g. ‘disk space low’). The software is still working as expected.

But I do agree that too much change for minor things is not great, so fair enough if you want to stick with things the way they are. But for me, this makes Huggingface harder to work with than it needs to be, swamping out my own logging. I'm quite surprised it's intentional!

@sgugger
Copy link
Collaborator

sgugger commented Nov 15, 2022

Ah sorry I misread, I do agree with you on the first comment, and this one might be something we can change as a warning is indeed too strong for this message.

@sgugger
Copy link
Collaborator

sgugger commented Nov 15, 2022

Just dug in the code base and didn't find any obvious logger.warning that tells the user about something loaded from cache. Could you tell me which one you saw (or was it from the Datasets library?)

@davidgilbertson
Copy link
Author

Oh good! :) Yes I just checked and this is actually coming from the datasets package.

@sgugger
Copy link
Collaborator

sgugger commented Nov 15, 2022

Ok, so you should open an issue there. Agreed with you that those should be info (and it's the reason you will see most of our examples set the log level of datasets to Error, to avoid getting those warnings).

@davidgilbertson
Copy link
Author

@sgugger something else I've just noticed is that sometimes transformers will set the log level to info. I can't pin down exactly when, but I see that there's lots of code that calls logging.set_verbosity_info() at the top level of the module.

Is that intentional? I don't understand the logic of a module globally changing the log level to INFO.

@sgugger
Copy link
Collaborator

sgugger commented Jan 20, 2023

Only scripts do this (mostly conversion scripts of models from their original repos to Transformers), not the module itself.

@davidgilbertson
Copy link
Author

Hmm, are these scripts ever called from the application code? There's definitely something that sets the log level to INFO, and I think it's related to loading a model for the first time (which is why it's hard to replicate).

@davidgilbertson
Copy link
Author

davidgilbertson commented Jan 23, 2023

Here's an example, I have this code that references a model not in my cache.

print(f"Verbosity: {transformers.logging.get_verbosity()}")
conf = transformers.AutoModel.from_pretrained("distilgpt2")
print(f"Verbosity: {transformers.logging.get_verbosity()}")

Interestingly, it ran and had the same log levels immediately before and after, but a second later, when I queried the log level in the console, it had changed.
image

I had a breakpoint on transformers.logging.set_verbosity that wasn't triggered, not sure why.

So do you have some post-download steps running a script that changes the log level?

Perhaps a good idea would be to move all those logging.set_verbosity_info() calls inside the if __name__ == "__main__": guards.

@davidgilbertson
Copy link
Author

Actually I'm going to re-open this, since there's still the bug of TrainingArguments defaulting to log level INFO so that just the act of creating a Trainer changes the log level.

Please do let me know if I'm wasting my time reporting these issues, maybe there's bigger fish to fry and no interest in fiddling with logging.

@sgugger
Copy link
Collaborator

sgugger commented Jan 23, 2023

I'm not sure I understand the bug here. Creating the Trainer with TrainingArguments at logel level INFO will change the log level yes. If you want another log level you should select it.

@davidgilbertson
Copy link
Author

No, if I have log level set to WARNING (the default) and create a Trainer, this changes the log level to INFO.

This code:

print(f"Verbosity: {transformers.logging.get_verbosity()}")
trainer = transformers.Trainer(
    model=model,
    args=TrainingArguments(
        output_dir=dg.get_root_dir("logs/hf"),
        evaluation_strategy="epoch",
        report_to=None,
        fp16=True,
    ),
    train_dataset=dataset["train"],
    eval_dataset=dataset["validation"],
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
)
print(f"Verbosity: {transformers.logging.get_verbosity()}")

Results in this:
image

@sgugger
Copy link
Collaborator

sgugger commented Jan 24, 2023

Yes, because you have left the logging value of the TrainingArguments to its default value of info.

Just so I understand better, you would like the TrainingArguments to defaults to None and only change the logging level if explicitly set to some value? I can get behind that if you want to make a PR.

@davidgilbertson
Copy link
Author

TrainingArguments defaults to passive, doesn't it? See here

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants