🔍 review logging for production use #34

jadudm · 2024-11-23T13:05:53Z

At a glance

In order to understand the app in production
as a system owner
I want to have good logging

Acceptance Criteria

We use DRY behavior-driven development wherever possible.

then...

Give feedback

[a thing happens]
Options

Shepherd

UX shepherd:
Design shepherd:
Engineering shepherd:

Background

The app already has extensive logging at the debug level, and some at the info level. However, when running in production, with many workers, even the info level can get noisy.

Review the code for critical-path info notices. This means looking for where we might have some info-level statements that help us see what is going on, but don't flood the zone. In fetch, I have a periodic log statement that says pages fetched: ###... something like this might be worth adding to all of the services, so that once a minute (or once every 10, or...) we have some kind of ping from the service, but we aren't logging every URL that goes through fetch (for example)
Check warn notices. There are not many. Make sure they're where they should be, and they are telling us the right thing.
Check error notices. Same. These include stack traces. I think I don't need to include zap.String("err", err.Error()) in these, because I think zap.L().Error(...) already attaches a stack trace.
Check fatal notices. These should be rare. They will reboot the service. So, if that is what we want, that's OK, but they should be the exception. The benefit is that they put a job back on the queue, but it may put a service in a boot loop. We should log at an error level first, so that the logs will leave some trace to follow.

We can always re-deploy at a lower logging level if we have to (e.g. downgrade from warn to info), but that is not where we want to be.

Security Considerations

Required per CM-4.

We already have a lot of logging, and that is good. We cover all the critical paths. The question is whether or not we can reduce the amount of logging, or we'll end up generating millions of lines per day.

Process checklist

If there's UI...

Screen reader - Listen to the experience with a screen reader extension, ensure the information presented in order
Keyboard navigation - Run through acceptance criteria with keyboard tabs, ensure it works.
Text scaling - Adjust viewport to 1280 pixels wide and zoom to 200%, ensure everything renders as expected. Document 400% zoom issues with USWDS if appropriate.

The text was updated successfully, but these errors were encountered:

jadudm changed the title ~~review logging for production use~~ 🔍 review logging for production use Nov 23, 2024

jadudm added this to jemison Nov 23, 2024

github-project-automation bot moved this to triage in jemison Nov 23, 2024

jadudm moved this from triage to backlog in jemison Nov 23, 2024

jadudm added this to the deploy to prototyping org in cloud.gov milestone Nov 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🔍 review logging for production use #34

🔍 review logging for production use #34

jadudm commented Nov 23, 2024

then...

If there's UI...

🔍 review logging for production use #34

🔍 review logging for production use #34

Comments

jadudm commented Nov 23, 2024

At a glance

Acceptance Criteria

then...

Shepherd

Background

Security Considerations

If there's UI...