Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Cloud Logging structured logs for Java and Python services #498

Closed
askmeegs opened this issue Jun 3, 2021 · 4 comments · Fixed by #516
Closed

Add Cloud Logging structured logs for Java and Python services #498

askmeegs opened this issue Jun 3, 2021 · 4 comments · Fixed by #516
Assignees
Labels
lang: java Issues specific to Java. lang: python Issues specific to Python. observability instrument logging, monitoring, tracing, etc priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@askmeegs
Copy link
Collaborator

askmeegs commented Jun 3, 2021

Only seeing this mislabeled severity with INFO logs --> mistakenly logged as ERROR.

Services where I see this: userservice, contacts, and frontend (all the python services), the two in-cluster databases (accounts and ledger). Java services are unaffected.

Example:

Screen Shot 2021-06-03 at 1 41 38 PM

Strangely, the line of code there is app.logger.info but it's getting mislabeled in severity, somewhere.

@askmeegs askmeegs added observability instrument logging, monitoring, tracing, etc type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p2 Moderately-important priority. Fix may not be included in next release. lang: python Issues specific to Python. labels Jun 3, 2021
@askmeegs askmeegs self-assigned this Jun 3, 2021
@askmeegs
Copy link
Collaborator Author

askmeegs commented Jun 3, 2021

Tested with an older, pre-UWSGI (gunicorn instead) version of the Python services -- release v0.2.0 https://github.com/GoogleCloudPlatform/bank-of-anthos/releases/tag/v0.2.0

Reason for testing that older version is Gunicorn had a separate logging.conf for the 3 python services, which we lost when we moved to UWSGI https://github.com/GoogleCloudPlatform/bank-of-anthos/blob/3ea9ebc1382ebc98133e6b17e532a99369404807/src/frontend/logging.conf

But the "info as error" bug seems to date back to when we had Gunicorn - seeing the same behavior on 0.2.0.

@askmeegs
Copy link
Collaborator Author

askmeegs commented Jun 4, 2021

After diving into the code + history + recommended ways of doing Python + Google Cloud Logging, I don't know if we ever set this up correctly, I think the log level has always rendered wrong in Cloud Logging.

Looking at the Cloud Ops Sandbox's python services -- example -- they have set up structured JSON logging using python-json-logger, overriding the severity field to correctly populate it before sending it to Cloud Logging. I've replicated this in the Python services and now the level is showing up correctly. Note that only the "message" shows up in the cloud logging message field, not

For consistency, I updated the Java service logs to use structured JSON as well, so that only the message shows up in the cloud logging message, not level / method / message / | [INFO] | login | Login Successful. - Users can still see these fields (like func name) in the JSON stdout --

Python stdout -

frontend-8577ff7fbf-czczh front {"asctime": "2021-06-04 21:56:19,445", "levelname": "INFO", "funcName": "payment", "message": "Payment initiated successfully.", "timestamp": 1622843779.445502, "severity": "INFO"}

Java stdout:

ledgerwriter-5494dcbcd5-9l8cl ledgerwriter {"thread":"main","level":"INFO","loggerName":"org.springframework.cloud.context.scope.GenericScope","message":"BeanFactory id=b2d1a364-a716-3809-a52f-510193dbe410","endOfBatch":false,"loggerFqcn":"org.apache.commons.logging.LogAdapter$Log4jLog","instant":{"epochSecond":1622844203,"nanoOfSecond":266000000},"contextMap":[],"threadId":1,"threadPriority":5,"source":{"method":"setSerializationId","file":"GenericScope.java","line":295,"class":"org.springframework.cloud.context.scope.GenericScope"},"time":"2021-06-04T22:03:23.266Z"}

And they can see those fields in the cloud logging UI --

Screen Shot 2021-06-04 at 5 03 56 PM

The mislabeled in-cluster Postgres logs are still all labeled as err - need to do more research to figure out how to proceed there.

@askmeegs
Copy link
Collaborator Author

askmeegs commented Jun 5, 2021

Also it seems like the Java severity is being mislogged as well -- (everything defaulting to Info)
java bad sverity

The scope of this work is bigger than I thought / more than just a bug fix. Unassigning myself so we can triage this as a full Cloud Logging overhaul for the app.

@askmeegs askmeegs changed the title Info logs are showing up as Error Add Cloud Logging structured logs for Java and Python services Jun 5, 2021
@askmeegs askmeegs added the lang: java Issues specific to Java. label Jun 5, 2021
@bourgeoisor
Copy link
Member

bourgeoisor commented Jul 26, 2021

I have tested this with the latest commit and do not see any false positive with Python services, likely due to #501.

I do still see the PostgresDB instances mis-logging everything as error (due to Postgres' default log destination being set to stderr). At first glance, it seems like we could redirect all to a syslog facility instead but I'm not sure if this implies the ability to separate info from errors.

Additionally, I'm seeing false negatives coming from the frontend (and potentially all Python services):
image

I'm also not seeing any JSON payload for the false negatives on the Java side. Rather, the errors appear as plain text:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lang: java Issues specific to Java. lang: python Issues specific to Python. observability instrument logging, monitoring, tracing, etc priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants