-
Notifications
You must be signed in to change notification settings - Fork 219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
structured logging: support log messages with line breaks #268
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: pohly The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
I'm open for suggestions regarding the markup for string start and end... |
Another alternative is with the key name repeated inside the start/end tags instead of the normal printing as variable name:
"multiLine" here is the key for the key/value pair, not some special hard-coded delimiter as "string" is in the current PR. |
The message string for |
1f31c16
to
709fe0c
Compare
709fe0c
to
480c8a4
Compare
I ended up revamping the entire formatting. See updated description and the new example for a comparison. |
480c8a4
to
74672eb
Compare
This is ready for review of the concept. I might still do some performance improvements. /cc @shivanshu1333 @serathius |
@pohly: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
b.WriteString(v) | ||
return | ||
} | ||
// Complex multi-line string, show as-is with indention. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any benefit of adding built in markers for logs. They
- don't make logs more readable as users can klog header to already distinguish start vs end of multi line log
- adding standard markers makes parsing logs harder, existing kubernetes log parsers already should handle multi line log.
- Increases log volume
K8s logs already have multi line logs that provide their own markers, thus I think we should leave it up user to provide
them themselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a misunderstanding. The markers are not delimiters for messages. They are delimiters for values.
With the current InfoS format, the quotation marks provides those delimiters. But they result in output that is basically unreadable for a human, which defeats the purpose of the text output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Increases log volume
That's a valid point. We could make them shorter. For example:
I1112 15:01:48.696677 413871 structured_logging.go:52] using InfoS with
the message across multiple lines: int=1 stringData>>>
long: Multiple
lines
with quite a bit
of text.
<<< str="another value"
That's four bytes (>>>
plus newline) instead of two (="
) and three bytes (<<<
) instead of one ("
) at the end.
Line breaks are the same with new line plus space instead of \n
.
If the text contains many quotation marks, this format might actually be shorter because those don't need to be escaped.
} | ||
|
||
func writeString(b *bytes.Buffer, s string) { | ||
for i, line := range strings.Split(s, "\n") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does strings.Split here needlessly copies the string into slices?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I'll add benchmarking and make the code more efficient before merging. Right now I am only looking for feedback on the format.
/hold
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#273 has detailed benchmark data.
b.WriteString(fmt.Sprintf("%q", msg)) | ||
// The message is never quoted. New lines get indented. | ||
writeString(b, msg) | ||
// Enhance readability by inserting : between message and key/value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't agree with this change, we decided on quoting the string as this gives us predictable and somewhat parsable output (not everyone will imminently switch to JSON). Idea of quoting was related to message being treated as structured metadata. This is based on idea that JSON uses "msg" key to store message, so we though that message should be written as other string values with one difference of msg=
being skipped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't agree with this change,
This PR or the line you commented on, the :
separator? I can drop that separator, it's not important.
But quoting is a major problem. I get what was meant to be achieved with it, but IMHO the effect on readability was not considered. it now prevents conversion to structured logging because developers will complain or even go as far as filing bugs and reverting the conversion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Marek, JSON should not become a dependency for log parsing, the raw log output should be parsable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current text format is not machine-readable. If that was the goal, then we have bugs in its implementation around handling of structs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So as you are both against this proposal, how do you suggest we should deal with cases where developers want multi-line output in a readable format (kubernetes/kubernetes#104868)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should separate how we consider multi line for different part of logs. In my opinion multiline:
- doesn't make sense for log messages. They should be a short description of what event is happening in system. Their format should not differ much or at all between different log formats. For example as I user that switched between log formats I want to be able to find log I have seen in old format in the new format just by grepping the message.
- doesn't make sense for log keys. They should strive to uniquely identify types of metadata attached to log. Strive comes from idea that over time a schema will develop that will have a predictable way to find values under specific keys. Their names should follow a strict convention to avoid diverging.
- can make sense for complex values when debugging. I assumed handling this case is needed Scheduler logged config prints without line breaks kubernetes#104868. There are multiple cases in k8s logs that multiline logs are used to improve readability. We should allow users to customize that some values (for example dumping whole k8s object yaml) are written in multiline format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can make sense for complex values when debugging.
Agreed.
We should allow users to customize that some values (for example dumping whole k8s object yaml) are written in multiline format.
And how to you propose to do that?
We need an API for it and a serialization format that log parsers then will understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion multiline doesn't make sense for log messages.
I can buy that argument. So let me propose a solution just for multi-line values in #273
I'm going to close this PR here in favor of that one.
/close
klog.go
Outdated
// pairs, but only when the message does not already end with | ||
// punctation. | ||
if len(msg) > 0 && !unicode.IsPunct(rune(msg[len(msg)-1])) { | ||
b.WriteString(":") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we think about using some other separator here? I've seen many logs use something like klog.InfoS("Warning: some text", key, value)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps |
, with a space before and after it?
74672eb
to
9ca2cd1
Compare
Structured logging with InfoS is formatted differently than traditional output, so it's worthwhile to have an example that demostrates that. The new example didn't compile initially because an older version of klog was used. This gets avoids by always using the source from the current directory.
The initial structured logging output (almost) always produced a single line per log message, with quoting of strings to represent line breaks as \n. This made the output hard to read (see kubernetes/kubernetes#104868). It was still possible to get line breaks when formatting a value with `%+v` and that ended up emitting line breaks; this was probably not intended. Now the message is never quoted, as in non-structured output. String values are quoted if they contain no line break. If they do, start/end markers delimit the text which appears on its own lines. All additional lines of a structure log message get indented by one space. This makes it obvious where a new log message starts, which is an improvement compared to the traditional format. Traditional output: I1112 14:06:35.783354 328441 structured_logging.go:42] someData printed using InfoF: {hello world 0} I1112 14:06:35.783472 328441 structured_logging.go:43] longData printed using InfoF: {long Multiple lines with quite a bit of text. 0} I1112 14:06:35.783483 328441 structured_logging.go:44] stringData printed using InfoF, with the message across multiple lines: long: Multiple lines with quite a bit of text. I1112 14:06:35.898176 142908 structured_logging.go:54] logData printed using InfoF: {log output from some program I0000 12:00:00.000000 123456 main.go:42] Starting E0000 12:00:01.000000 123456 main.go:43] Failed for some reason 0} Old InfoS output before this commit: I1112 14:06:35.783512 328441 structured_logging.go:50] "using InfoS" someData={Name:hello Data:world internal:0} I1112 14:06:35.783529 328441 structured_logging.go:51] "using InfoS" longData={Name:long Data:Multiple lines with quite a bit of text. internal:0} I1112 14:06:35.783549 328441 structured_logging.go:52] "using InfoS with\nthe message across multiple lines" int=1 stringData="long: Multiple\nlines\nwith quite a bit\nof text." str="another value" I1112 14:06:35.783565 328441 structured_logging.go:61] "Did something" item="foobar" I1112 14:06:35.783576 328441 structured_logging.go:63] "This is a full sentence." item="foobar" I1112 14:06:35.898278 142908 structured_logging.go:65] "using InfoS" logData={Name:log output from some program Data:I0000 12:00:00.000000 123456 main.go:42] Starting E0000 12:00:01.000000 123456 main.go:43] Failed for some reason internal:0} New InfoS output: I1112 15:01:48.696617 413871 structured_logging.go:50] using InfoS: someData={Name:hello Data:world internal:0} I1112 15:01:48.696646 413871 structured_logging.go:51] using InfoS: ===start of longData=== {Name:long Data:Multiple lines with quite a bit of text. internal:0} ===end of longData=== I1112 15:01:48.696677 413871 structured_logging.go:52] using InfoS with the message across multiple lines: int=1 ===start of stringData=== long: Multiple lines with quite a bit of text. ===end of stringData=== str="another value" I1112 15:01:48.788207 143146 structured_logging.go:65] using InfoS: ===start of logData=== {Name:log output from some program Data:I0000 12:00:00.000000 123456 main.go:42] Starting E0000 12:00:01.000000 123456 main.go:43] Failed for some reason internal:0} ===end of logData=== I1112 15:01:48.696704 413871 structured_logging.go:61] Did something: item="foobar" I1112 15:01:48.696719 413871 structured_logging.go:63] This is a full sentence. item="foobar"
Multi-line strings are now shown as fieldName>>> <<< and "| " is used as separator between message and values. This is shorter: I1119 13:18:00.210970 78836 structured_logging.go:58] using InfoS | someData={Name:hello Data:world internal:0} I1119 13:18:00.210988 78836 structured_logging.go:59] using InfoS | longData>>> {Name:long Data:Multiple lines with quite a bit of text. internal:0} <<< I1119 13:18:00.211009 78836 structured_logging.go:60] using InfoS with the message across multiple lines | int=1 stringData>>> long: Multiple lines with quite a bit of text. <<< str="another value" I1119 13:18:00.211032 78836 structured_logging.go:65] using InfoS | logData>>> {Name:log output from some program Data:I0000 12:00:00.000000 123456 main.go:42] Starting E0000 12:00:01.000000 123456 main.go:43] Failed for some reason internal:0} <<< I1119 13:18:00.211055 78836 structured_logging.go:66] using InfoS | boolean=true int=1 float=0.1 I1119 13:18:00.211068 78836 structured_logging.go:71] Did something | item="foobar" I1119 13:18:00.211079 78836 structured_logging.go:73] This is a full sentence. item="foobar"
9ca2cd1
to
954847b
Compare
When comparing this change with kubernetes/kubernetes#106594 (comment) shows that the new format is slightly smaller: 61450443 bytes vs. 60583121 bytes for a kubelet log with 268551 messages and 310746 key/value pairs in those messages. |
@pohly: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What this PR does / why we need it:
The initial structured logging output (almost) always produced a single line
per log message, with quoting of strings to represent line breaks as \n. This
made the output hard to read (see
kubernetes/kubernetes#104868).
It was still possible to get line breaks when formatting a value with
%+v
andthat ended up emitting line breaks; this was probably not intended.
Now the message is never quoted, as in non-structured output. String values are
quoted if they contain no line break. If they do, start/end markers delimit the
text which appears on its own lines.
All additional lines of a structure log message get indented by one space. This
makes it obvious where a new log message starts, which is an improvement
compared to the traditional format.
Which issue(s) this PR fixes:
Fixes kubernetes/kubernetes#106262, kubernetes/kubernetes#106428
Special notes for your reviewer:
Traditional output:
Old InfoS output before this commit:
New InfoS output:
Release note: