Less noisy console output #20154

davidgilbertson · 2022-11-10T04:26:36Z

Feature request

I have just started using HF transformers and am struck by the amount of text it dumps into the console.

Just following the steps in the course, a simple script that loads a model and trains it spews out this:

...my monitor isn't big enough to screenshot it all :)

Note that I'm using PyCharm, and specifically the built in "Python Console".

Motivation

This is a problem because now a user has two options:

every single time they run their code, scan the wall of text to see if there is any new information that they haven't seen 300 times before
not re-read the wall of text every time, and potentially miss out on useful information that they need to attend to.

Also, it's just not very pretty, and pretty is nice.

Your contribution

Only suggestions/questions:

Do any HF developers use PyCharm's Python Console? Maybe it's worth testing on this, it's flawed, but quite popular.
You can check whether the environment is a TTY with sys.stdout.isatty(). tqdm simply doesn't work well when not in a terminal (beyond a simple indicator, as long as you don't print anything while the indicator is active). So a good solution is to simply print the results at the end for these environments.
I don't think writing cache files is something to notify the user about. Perhaps it's worth thinking in terms of 'user personas': to the first-time user, this is useful information. For every other run, once the user knows that HF writes checkpoints in a certain place, it's no longer information that needs to be logged and so goes from helpful to detrimental, since it makes important info harder to spot. Maybe the problem is just that HF it setting the log level to "INFO" when the default Python level is "WARNING" and all you need to do is pick up the correct log level from the user's environment and most of the junk will disappear.
I don't know why it's all red, this also adds to the difficulty in seeing real errors (and also adds to the ugliness).

I hope this is useful and not just me complaining...

The text was updated successfully, but these errors were encountered:

sgugger · 2022-11-10T14:55:56Z

You can adjust the logging level to your preferred value with transformers.utils.logging.set_verbosity(log_level).

Also cc @LysandreJik

davidgilbertson · 2022-11-10T19:09:01Z

Thanks @sgugger, I just tried that.

There's still plenty of noise from code in HF (datasets) like this:

logger.warning(f"Loading cached processed dataset at {cache_file_name}")

Also, Trainer will call args.get_process_log_level() and overwrite whatever I've set with logging.set_verbosity(). I think there might be something wrong in get_process_log_level (after a quick glance). The docstring says that the default log level is passive and that this won't change anything, but this line in the code explicitly sets the log level to INFO. Should that not default to logger.getEffectiveLevel()?

sgugger · 2022-11-10T20:07:17Z

You'll need to do the same thing for datasets (same API I believe 🤞 ). As for the Trainer it's very possible that there is a bug. Do you want to suggest a fix in a PR?

davidgilbertson · 2022-11-12T20:16:02Z

I've got a full schedule with study at the moment, sorry.

So in summary this looks like 2.5 issues:

messages like "loaded from cache" should be log level INFO, not WARNING
the default log level should read from the user's environment, or at least use the same default as Python, which is 30/WARNING, not 20/INFO
And a nice to have would be that all HF packages shared a log-level setting, although if the defaults are right this is not a big deal.

sgugger · 2022-11-14T06:18:50Z

I disagree with the first two, and even if I did, it's too late to change it without surprising the whole user community. The only issue I see is the bug in Trainer you reported :-)

davidgilbertson · 2022-11-14T19:11:01Z

Ah, interesting, I thought the first one was quite clear cut. From the Python logging how-to:

INFO: Confirmation that things are working as expected.
WARNING: An indication that something unexpected happened, or indicative of some problem in the near future (e.g. ‘disk space low’). The software is still working as expected.

But I do agree that too much change for minor things is not great, so fair enough if you want to stick with things the way they are. But for me, this makes Huggingface harder to work with than it needs to be, swamping out my own logging. I'm quite surprised it's intentional!

sgugger · 2022-11-15T14:37:44Z

Ah sorry I misread, I do agree with you on the first comment, and this one might be something we can change as a warning is indeed too strong for this message.

sgugger · 2022-11-15T15:49:59Z

Just dug in the code base and didn't find any obvious logger.warning that tells the user about something loaded from cache. Could you tell me which one you saw (or was it from the Datasets library?)

davidgilbertson · 2022-11-15T20:59:26Z

Oh good! :) Yes I just checked and this is actually coming from the datasets package.

sgugger · 2022-11-15T22:41:12Z

Ok, so you should open an issue there. Agreed with you that those should be info (and it's the reason you will see most of our examples set the log level of datasets to Error, to avoid getting those warnings).

davidgilbertson · 2023-01-20T00:37:25Z

@sgugger something else I've just noticed is that sometimes transformers will set the log level to info. I can't pin down exactly when, but I see that there's lots of code that calls logging.set_verbosity_info() at the top level of the module.

Is that intentional? I don't understand the logic of a module globally changing the log level to INFO.

sgugger · 2023-01-20T14:07:44Z

Only scripts do this (mostly conversion scripts of models from their original repos to Transformers), not the module itself.

davidgilbertson · 2023-01-22T21:55:34Z

Hmm, are these scripts ever called from the application code? There's definitely something that sets the log level to INFO, and I think it's related to loading a model for the first time (which is why it's hard to replicate).

davidgilbertson · 2023-01-23T00:07:31Z

Here's an example, I have this code that references a model not in my cache.

print(f"Verbosity: {transformers.logging.get_verbosity()}")
conf = transformers.AutoModel.from_pretrained("distilgpt2")
print(f"Verbosity: {transformers.logging.get_verbosity()}")

Interestingly, it ran and had the same log levels immediately before and after, but a second later, when I queried the log level in the console, it had changed.

I had a breakpoint on transformers.logging.set_verbosity that wasn't triggered, not sure why.

So do you have some post-download steps running a script that changes the log level?

Perhaps a good idea would be to move all those logging.set_verbosity_info() calls inside the if __name__ == "__main__": guards.

davidgilbertson · 2023-01-23T01:49:09Z

Actually I'm going to re-open this, since there's still the bug of TrainingArguments defaulting to log level INFO so that just the act of creating a Trainer changes the log level.

Please do let me know if I'm wasting my time reporting these issues, maybe there's bigger fish to fry and no interest in fiddling with logging.

sgugger · 2023-01-23T14:29:58Z

I'm not sure I understand the bug here. Creating the Trainer with TrainingArguments at logel level INFO will change the log level yes. If you want another log level you should select it.

davidgilbertson · 2023-01-23T23:10:13Z

No, if I have log level set to WARNING (the default) and create a Trainer, this changes the log level to INFO.

This code:

print(f"Verbosity: {transformers.logging.get_verbosity()}")
trainer = transformers.Trainer(
    model=model,
    args=TrainingArguments(
        output_dir=dg.get_root_dir("logs/hf"),
        evaluation_strategy="epoch",
        report_to=None,
        fp16=True,
    ),
    train_dataset=dataset["train"],
    eval_dataset=dataset["validation"],
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
)
print(f"Verbosity: {transformers.logging.get_verbosity()}")

Results in this:

sgugger · 2023-01-24T15:29:00Z

Yes, because you have left the logging value of the TrainingArguments to its default value of info.

Just so I understand better, you would like the TrainingArguments to defaults to None and only change the logging level if explicitly set to some value? I can get behind that if you want to make a PR.

davidgilbertson · 2023-01-24T22:17:02Z

TrainingArguments defaults to passive, doesn't it? See here

github-actions · 2023-03-16T15:02:48Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

davidgilbertson closed this as completed Nov 14, 2022

davidgilbertson reopened this Jan 23, 2023

huggingface deleted a comment from github-actions bot Feb 20, 2023

sgugger mentioned this issue Feb 20, 2023

Respect documentation on passive log level #21700

Merged

github-actions bot closed this as completed Mar 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Less noisy console output #20154

Less noisy console output #20154

davidgilbertson commented Nov 10, 2022

sgugger commented Nov 10, 2022 •

edited

Loading

davidgilbertson commented Nov 10, 2022

sgugger commented Nov 10, 2022

davidgilbertson commented Nov 12, 2022

sgugger commented Nov 14, 2022

davidgilbertson commented Nov 14, 2022

sgugger commented Nov 15, 2022

sgugger commented Nov 15, 2022

davidgilbertson commented Nov 15, 2022

sgugger commented Nov 15, 2022

davidgilbertson commented Jan 20, 2023

sgugger commented Jan 20, 2023

davidgilbertson commented Jan 22, 2023

davidgilbertson commented Jan 23, 2023 •

edited

Loading

davidgilbertson commented Jan 23, 2023

sgugger commented Jan 23, 2023

davidgilbertson commented Jan 23, 2023

sgugger commented Jan 24, 2023 •

edited

Loading

davidgilbertson commented Jan 24, 2023

github-actions bot commented Mar 16, 2023

Less noisy console output #20154

Less noisy console output #20154

Comments

davidgilbertson commented Nov 10, 2022

Feature request

Motivation

Your contribution

sgugger commented Nov 10, 2022 • edited Loading

davidgilbertson commented Nov 10, 2022

sgugger commented Nov 10, 2022

davidgilbertson commented Nov 12, 2022

sgugger commented Nov 14, 2022

davidgilbertson commented Nov 14, 2022

sgugger commented Nov 15, 2022

sgugger commented Nov 15, 2022

davidgilbertson commented Nov 15, 2022

sgugger commented Nov 15, 2022

davidgilbertson commented Jan 20, 2023

sgugger commented Jan 20, 2023

davidgilbertson commented Jan 22, 2023

davidgilbertson commented Jan 23, 2023 • edited Loading

davidgilbertson commented Jan 23, 2023

sgugger commented Jan 23, 2023

davidgilbertson commented Jan 23, 2023

sgugger commented Jan 24, 2023 • edited Loading

davidgilbertson commented Jan 24, 2023

github-actions bot commented Mar 16, 2023

sgugger commented Nov 10, 2022 •

edited

Loading

davidgilbertson commented Jan 23, 2023 •

edited

Loading

sgugger commented Jan 24, 2023 •

edited

Loading