Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🔍 review logging for production use #34

Open
1 task
jadudm opened this issue Nov 23, 2024 · 0 comments
Open
1 task

🔍 review logging for production use #34

jadudm opened this issue Nov 23, 2024 · 0 comments

Comments

@jadudm
Copy link
Collaborator

jadudm commented Nov 23, 2024

At a glance

In order to understand the app in production
as a system owner
I want to have good logging

Acceptance Criteria

We use DRY behavior-driven development wherever possible.

then...

Shepherd

  • UX shepherd:
  • Design shepherd:
  • Engineering shepherd:

Background

The app already has extensive logging at the debug level, and some at the info level. However, when running in production, with many workers, even the info level can get noisy.

  1. Review the code for critical-path info notices. This means looking for where we might have some info-level statements that help us see what is going on, but don't flood the zone. In fetch, I have a periodic log statement that says pages fetched: ###... something like this might be worth adding to all of the services, so that once a minute (or once every 10, or...) we have some kind of ping from the service, but we aren't logging every URL that goes through fetch (for example)
  2. Check warn notices. There are not many. Make sure they're where they should be, and they are telling us the right thing.
  3. Check error notices. Same. These include stack traces. I think I don't need to include zap.String("err", err.Error()) in these, because I think zap.L().Error(...) already attaches a stack trace.
  4. Check fatal notices. These should be rare. They will reboot the service. So, if that is what we want, that's OK, but they should be the exception. The benefit is that they put a job back on the queue, but it may put a service in a boot loop. We should log at an error level first, so that the logs will leave some trace to follow.

We can always re-deploy at a lower logging level if we have to (e.g. downgrade from warn to info), but that is not where we want to be.

Security Considerations

Required per CM-4.

We already have a lot of logging, and that is good. We cover all the critical paths. The question is whether or not we can reduce the amount of logging, or we'll end up generating millions of lines per day.


Process checklist
  • Has a clear story statement
  • Can reasonably be done in a few days (otherwise, split this up!)
  • Shepherds have been identified
  • UX youexes all the things
  • Design designs all the things
  • Engineering engineers all the things
  • Meets acceptance criteria
  • Meets QASP conditions
  • Presented in a review
  • Includes screenshots or references to artifacts
  • Tagged with the sprint where it was finished
  • Archived

If there's UI...

  • Screen reader - Listen to the experience with a screen reader extension, ensure the information presented in order
  • Keyboard navigation - Run through acceptance criteria with keyboard tabs, ensure it works.
  • Text scaling - Adjust viewport to 1280 pixels wide and zoom to 200%, ensure everything renders as expected. Document 400% zoom issues with USWDS if appropriate.
@jadudm jadudm changed the title review logging for production use 🔍 review logging for production use Nov 23, 2024
@jadudm jadudm added this to jemison Nov 23, 2024
@github-project-automation github-project-automation bot moved this to triage in jemison Nov 23, 2024
@jadudm jadudm moved this from triage to backlog in jemison Nov 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: backlog
Development

No branches or pull requests

1 participant