You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The app already has extensive logging at the debug level, and some at the info level. However, when running in production, with many workers, even the info level can get noisy.
Review the code for critical-path info notices. This means looking for where we might have some info-level statements that help us see what is going on, but don't flood the zone. In fetch, I have a periodic log statement that says pages fetched: ###... something like this might be worth adding to all of the services, so that once a minute (or once every 10, or...) we have some kind of ping from the service, but we aren't logging every URL that goes through fetch (for example)
Check warn notices. There are not many. Make sure they're where they should be, and they are telling us the right thing.
Check error notices. Same. These include stack traces. I think I don't need to include zap.String("err", err.Error()) in these, because I think zap.L().Error(...) already attaches a stack trace.
Check fatal notices. These should be rare. They will reboot the service. So, if that is what we want, that's OK, but they should be the exception. The benefit is that they put a job back on the queue, but it may put a service in a boot loop. We should log at an error level first, so that the logs will leave some trace to follow.
We can always re-deploy at a lower logging level if we have to (e.g. downgrade from warn to info), but that is not where we want to be.
We already have a lot of logging, and that is good. We cover all the critical paths. The question is whether or not we can reduce the amount of logging, or we'll end up generating millions of lines per day.
Process checklist
Has a clear story statement
Can reasonably be done in a few days (otherwise, split this up!)
Screen reader - Listen to the experience with a screen reader extension, ensure the information presented in order
Keyboard navigation - Run through acceptance criteria with keyboard tabs, ensure it works.
Text scaling - Adjust viewport to 1280 pixels wide and zoom to 200%, ensure everything renders as expected. Document 400% zoom issues with USWDS if appropriate.
The text was updated successfully, but these errors were encountered:
jadudm
changed the title
review logging for production use
🔍 review logging for production use
Nov 23, 2024
At a glance
In order to understand the app in production
as a system owner
I want to have good logging
Acceptance Criteria
We use DRY behavior-driven development wherever possible.
then...
Shepherd
Background
The app already has extensive logging at the
debug
level, and some at theinfo
level. However, when running in production, with many workers, even theinfo
level can get noisy.info
notices. This means looking for where we might have someinfo
-level statements that help us see what is going on, but don't flood the zone. Infetch
, I have a periodic log statement that sayspages fetched: ###
... something like this might be worth adding to all of the services, so that once a minute (or once every 10, or...) we have some kind of ping from the service, but we aren't logging every URL that goes throughfetch
(for example)warn
notices. There are not many. Make sure they're where they should be, and they are telling us the right thing.error
notices. Same. These include stack traces. I think I don't need to includezap.String("err", err.Error())
in these, because I thinkzap.L().Error(...)
already attaches a stack trace.fatal
notices. These should be rare. They will reboot the service. So, if that is what we want, that's OK, but they should be the exception. The benefit is that they put a job back on the queue, but it may put a service in a boot loop. We should log at an error level first, so that the logs will leave some trace to follow.We can always re-deploy at a lower logging level if we have to (e.g. downgrade from
warn
toinfo
), but that is not where we want to be.Security Considerations
Required per CM-4.
We already have a lot of logging, and that is good. We cover all the critical paths. The question is whether or not we can reduce the amount of logging, or we'll end up generating millions of lines per day.
Process checklist
If there's UI...
The text was updated successfully, but these errors were encountered: