structured logging: support log messages with line breaks #268

pohly · 2021-11-09T13:45:57Z

What this PR does / why we need it:

The initial structured logging output (almost) always produced a single line
per log message, with quoting of strings to represent line breaks as \n. This
made the output hard to read (see
kubernetes/kubernetes#104868).

It was still possible to get line breaks when formatting a value with %+v and
that ended up emitting line breaks; this was probably not intended.

Now the message is never quoted, as in non-structured output. String values are
quoted if they contain no line break. If they do, start/end markers delimit the
text which appears on its own lines.

All additional lines of a structure log message get indented by one space. This
makes it obvious where a new log message starts, which is an improvement
compared to the traditional format.

Which issue(s) this PR fixes:

Fixes kubernetes/kubernetes#106262, kubernetes/kubernetes#106428

Special notes for your reviewer:

Traditional output:

I1112 14:06:35.783354  328441 structured_logging.go:42] someData printed using InfoF: {hello world 0}
I1112 14:06:35.783472  328441 structured_logging.go:43] longData printed using InfoF: {long Multiple
lines
with quite a bit
of text. 0}
I1112 14:06:35.783483  328441 structured_logging.go:44] stringData printed using InfoF,
with the message across multiple lines:
long: Multiple
lines
with quite a bit
of text.
I1112 14:06:35.898176  142908 structured_logging.go:54] logData printed using InfoF:
{log output from some program I0000 12:00:00.000000  123456 main.go:42] Starting
E0000 12:00:01.000000  123456 main.go:43] Failed for some reason
 0}

Old InfoS output before this commit:

I1112 14:06:35.783512  328441 structured_logging.go:50] "using InfoS" someData={Name:hello Data:world internal:0}
I1112 14:06:35.783529  328441 structured_logging.go:51] "using InfoS" longData={Name:long Data:Multiple
lines
with quite a bit
of text. internal:0}
I1112 14:06:35.783549  328441 structured_logging.go:52] "using InfoS with\nthe message across multiple lines" nt=1 stringData="long: Multiple\nlines\nwith quite a bit\nof text." str="another value"
I1112 14:06:35.898278  142908 structured_logging.go:65] "using InfoS" logData={Name:log output from some program Data:I0000 12:00:00.000000  123456 main.go:42] Starting
E0000 12:00:01.000000  123456 main.go:43] Failed for some reason
 internal:0}
I1112 14:06:35.783565  328441 structured_logging.go:61] "Did something" item="foobar"
I1112 14:06:35.783576  328441 structured_logging.go:63] "This is a full sentence." item="foobar"

New InfoS output:

I1119 13:18:00.210970   78836 structured_logging.go:58] using InfoS | someData={Name:hello Data:world internal:0}
I1119 13:18:00.210988   78836 structured_logging.go:59] using InfoS | longData>>>
 {Name:long Data:Multiple
 lines
 with quite a bit
 of text. internal:0}
 <<<
I1119 13:18:00.211009   78836 structured_logging.go:60] using InfoS with
 the message across multiple lines | int=1 stringData>>>
 long: Multiple
 lines
 with quite a bit
 of text.
 <<< str="another value"
I1119 13:18:00.211032   78836 structured_logging.go:65] using InfoS | logData>>>
 {Name:log output from some program Data:I0000 12:00:00.000000  123456 main.go:42] Starting
 E0000 12:00:01.000000  123456 main.go:43] Failed for some reason
  internal:0}
 <<<
I1119 13:18:00.211068   78836 structured_logging.go:71] Did something | item="foobar"
I1119 13:18:00.211079   78836 structured_logging.go:73] This is a full sentence. item="foobar"

Release note:

In structured output (= InfoS, ErrorS), values with line breaks will be printed across multiple lines to make the output more readable. The message is also not quoted anymore and may stretch across multiple lines. Every new line gets indented by one space.

k8s-ci-robot · 2021-11-09T13:46:20Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: pohly
To complete the pull request process, please assign justinsb after the PR has been reviewed.
You can assign the PR to them by writing /assign @justinsb in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

pohly · 2021-11-09T13:47:39Z

I'm open for suggestions regarding the markup for string start and end...

pohly · 2021-11-09T13:49:19Z

Another alternative is with the key name repeated inside the start/end tags instead of the normal printing as variable name:

 <previous key/value> vvvvv multiLine vvvvv
 ....
 ^^^^^ multiLine ^^^^^
 <next key/value>

"multiLine" here is the key for the key/value pair, not some special hard-coded delimiter as "string" is in the current PR.

pohly · 2021-11-09T19:26:01Z

The message string for InfoS has the same problem. I also find the quotation marks around the text distracting. On the other hand, we may want to provide some hint on where the message text ends and key/value pairs start.

pohly · 2021-11-12T14:17:02Z

I ended up revamping the entire formatting. See updated description and the new example for a comparison.

pohly · 2021-11-12T14:22:44Z

This is ready for review of the concept. I might still do some performance improvements.

/cc @shivanshu1333 @serathius
/wg structured-logging
/sig instrumentation
/priority important-soon
/milestone 1.23
/area logging
/kind feature

k8s-ci-robot · 2021-11-12T14:22:47Z

@pohly: The label(s) area/logging cannot be applied, because the repository doesn't have them.

In response to this:

This is ready for review of the concept. I might still do some performance improvements.

/cc @shivanshu1333 @serathius
/wg structured-logging
/sig instrumentation
/priority important-soon
/milestone 1.23
/area logging
/kind feature

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

serathius · 2021-11-15T12:50:59Z

klog.go

+		b.WriteString(v)
+		return
+	}
+	// Complex multi-line string, show as-is with indention.


I don't see any benefit of adding built in markers for logs. They

don't make logs more readable as users can klog header to already distinguish start vs end of multi line log

adding standard markers makes parsing logs harder, existing kubernetes log parsers already should handle multi line log.

Increases log volume

K8s logs already have multi line logs that provide their own markers, thus I think we should leave it up user to provide
them themselves.

I think there's a misunderstanding. The markers are not delimiters for messages. They are delimiters for values.

With the current InfoS format, the quotation marks provides those delimiters. But they result in output that is basically unreadable for a human, which defeats the purpose of the text output.

Increases log volume

That's a valid point. We could make them shorter. For example:

I1112 15:01:48.696677 413871 structured_logging.go:52] using InfoS with the message across multiple lines: int=1 stringData>>> long: Multiple lines with quite a bit of text. <<< str="another value"

That's four bytes (>>> plus newline) instead of two (=") and three bytes (<<<) instead of one (") at the end.

Line breaks are the same with new line plus space instead of \n.

If the text contains many quotation marks, this format might actually be shorter because those don't need to be escaped.

serathius · 2021-11-15T12:53:18Z

klog.go

+}
+
+func writeString(b *bytes.Buffer, s string) {
+	for i, line := range strings.Split(s, "\n") {


Does strings.Split here needlessly copies the string into slices?

Yes. I'll add benchmarking and make the code more efficient before merging. Right now I am only looking for feedback on the format.

/hold

#273 has detailed benchmark data.

serathius · 2021-11-15T13:03:25Z

klog.go

-	b.WriteString(fmt.Sprintf("%q", msg))
+	// The message is never quoted. New lines get indented.
+	writeString(b, msg)
+	// Enhance readability by inserting : between message and key/value


I don't agree with this change, we decided on quoting the string as this gives us predictable and somewhat parsable output (not everyone will imminently switch to JSON). Idea of quoting was related to message being treated as structured metadata. This is based on idea that JSON uses "msg" key to store message, so we though that message should be written as other string values with one difference of msg= being skipped.

I don't agree with this change,

This PR or the line you commented on, the : separator? I can drop that separator, it's not important.

But quoting is a major problem. I get what was meant to be achieved with it, but IMHO the effect on readability was not considered. it now prevents conversion to structured logging because developers will complain or even go as far as filing bugs and reverting the conversion.

I agree with Marek, JSON should not become a dependency for log parsing, the raw log output should be parsable

The current text format is not machine-readable. If that was the goal, then we have bugs in its implementation around handling of structs.

So as you are both against this proposal, how do you suggest we should deal with cases where developers want multi-line output in a readable format (kubernetes/kubernetes#104868)?

I think we should separate how we consider multi line for different part of logs. In my opinion multiline:

doesn't make sense for log messages. They should be a short description of what event is happening in system. Their format should not differ much or at all between different log formats. For example as I user that switched between log formats I want to be able to find log I have seen in old format in the new format just by grepping the message.

doesn't make sense for log keys. They should strive to uniquely identify types of metadata attached to log. Strive comes from idea that over time a schema will develop that will have a predictable way to find values under specific keys. Their names should follow a strict convention to avoid diverging.

can make sense for complex values when debugging. I assumed handling this case is needed Scheduler logged config prints without line breaks kubernetes#104868. There are multiple cases in k8s logs that multiline logs are used to improve readability. We should allow users to customize that some values (for example dumping whole k8s object yaml) are written in multiline format.

can make sense for complex values when debugging.

Agreed.

We should allow users to customize that some values (for example dumping whole k8s object yaml) are written in multiline format.

And how to you propose to do that?

We need an API for it and a serialization format that log parsers then will understand.

In my opinion multiline doesn't make sense for log messages.

I can buy that argument. So let me propose a solution just for multi-line values in #273

I'm going to close this PR here in favor of that one.

/close

shivanshuraj1333 · 2021-11-15T13:14:04Z

klog.go

+	// pairs, but only when the message does not already end with
+	// punctation.
+	if len(msg) > 0 && !unicode.IsPunct(rune(msg[len(msg)-1])) {
+		b.WriteString(":")


Can we think about using some other separator here? I've seen many logs use something like klog.InfoS("Warning: some text", key, value)

Perhaps |, with a space before and after it?

Structured logging with InfoS is formatted differently than traditional output, so it's worthwhile to have an example that demostrates that. The new example didn't compile initially because an older version of klog was used. This gets avoids by always using the source from the current directory.

The initial structured logging output (almost) always produced a single line per log message, with quoting of strings to represent line breaks as \n. This made the output hard to read (see kubernetes/kubernetes#104868). It was still possible to get line breaks when formatting a value with `%+v` and that ended up emitting line breaks; this was probably not intended. Now the message is never quoted, as in non-structured output. String values are quoted if they contain no line break. If they do, start/end markers delimit the text which appears on its own lines. All additional lines of a structure log message get indented by one space. This makes it obvious where a new log message starts, which is an improvement compared to the traditional format. Traditional output: I1112 14:06:35.783354 328441 structured_logging.go:42] someData printed using InfoF: {hello world 0} I1112 14:06:35.783472 328441 structured_logging.go:43] longData printed using InfoF: {long Multiple lines with quite a bit of text. 0} I1112 14:06:35.783483 328441 structured_logging.go:44] stringData printed using InfoF, with the message across multiple lines: long: Multiple lines with quite a bit of text. I1112 14:06:35.898176 142908 structured_logging.go:54] logData printed using InfoF: {log output from some program I0000 12:00:00.000000 123456 main.go:42] Starting E0000 12:00:01.000000 123456 main.go:43] Failed for some reason 0} Old InfoS output before this commit: I1112 14:06:35.783512 328441 structured_logging.go:50] "using InfoS" someData={Name:hello Data:world internal:0} I1112 14:06:35.783529 328441 structured_logging.go:51] "using InfoS" longData={Name:long Data:Multiple lines with quite a bit of text. internal:0} I1112 14:06:35.783549 328441 structured_logging.go:52] "using InfoS with\nthe message across multiple lines" int=1 stringData="long: Multiple\nlines\nwith quite a bit\nof text." str="another value" I1112 14:06:35.783565 328441 structured_logging.go:61] "Did something" item="foobar" I1112 14:06:35.783576 328441 structured_logging.go:63] "This is a full sentence." item="foobar" I1112 14:06:35.898278 142908 structured_logging.go:65] "using InfoS" logData={Name:log output from some program Data:I0000 12:00:00.000000 123456 main.go:42] Starting E0000 12:00:01.000000 123456 main.go:43] Failed for some reason internal:0} New InfoS output: I1112 15:01:48.696617 413871 structured_logging.go:50] using InfoS: someData={Name:hello Data:world internal:0} I1112 15:01:48.696646 413871 structured_logging.go:51] using InfoS: ===start of longData=== {Name:long Data:Multiple lines with quite a bit of text. internal:0} ===end of longData=== I1112 15:01:48.696677 413871 structured_logging.go:52] using InfoS with the message across multiple lines: int=1 ===start of stringData=== long: Multiple lines with quite a bit of text. ===end of stringData=== str="another value" I1112 15:01:48.788207 143146 structured_logging.go:65] using InfoS: ===start of logData=== {Name:log output from some program Data:I0000 12:00:00.000000 123456 main.go:42] Starting E0000 12:00:01.000000 123456 main.go:43] Failed for some reason internal:0} ===end of logData=== I1112 15:01:48.696704 413871 structured_logging.go:61] Did something: item="foobar" I1112 15:01:48.696719 413871 structured_logging.go:63] This is a full sentence. item="foobar"

Multi-line strings are now shown as fieldName>>> <<< and "| " is used as separator between message and values. This is shorter: I1119 13:18:00.210970 78836 structured_logging.go:58] using InfoS | someData={Name:hello Data:world internal:0} I1119 13:18:00.210988 78836 structured_logging.go:59] using InfoS | longData>>> {Name:long Data:Multiple lines with quite a bit of text. internal:0} <<< I1119 13:18:00.211009 78836 structured_logging.go:60] using InfoS with the message across multiple lines | int=1 stringData>>> long: Multiple lines with quite a bit of text. <<< str="another value" I1119 13:18:00.211032 78836 structured_logging.go:65] using InfoS | logData>>> {Name:log output from some program Data:I0000 12:00:00.000000 123456 main.go:42] Starting E0000 12:00:01.000000 123456 main.go:43] Failed for some reason internal:0} <<< I1119 13:18:00.211055 78836 structured_logging.go:66] using InfoS | boolean=true int=1 float=0.1 I1119 13:18:00.211068 78836 structured_logging.go:71] Did something | item="foobar" I1119 13:18:00.211079 78836 structured_logging.go:73] This is a full sentence. item="foobar"

pohly · 2021-11-22T18:22:31Z

When comparing this change with kubernetes/kubernetes#106594 (comment) shows that the new format is slightly smaller: 61450443 bytes vs. 60583121 bytes for a kubelet log with 268551 messages and 310746 key/value pairs in those messages.

k8s-ci-robot · 2021-11-26T13:20:01Z

@pohly: Closed this PR.

In response to this:

In my opinion multiline doesn't make sense for log messages.

I can buy that argument. So let me propose a solution just for multi-line values in #273

I'm going to close this PR here in favor of that one.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 9, 2021

k8s-ci-robot requested review from hoegaarden and neolit123 November 9, 2021 13:46

pohly changed the title ~~structured logging: avoid quoting of strings with line breaks~~ RFC: structured logging: avoid quoting of strings with line breaks Nov 9, 2021

pohly mentioned this pull request Nov 9, 2021

structured logging: better support for multi-line strings kubernetes/kubernetes#106262

Closed

pohly force-pushed the multi-line-value branch from 1f31c16 to 709fe0c Compare November 12, 2021 14:13

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 12, 2021

pohly changed the title ~~RFC: structured logging: avoid quoting of strings with line breaks~~ structured logging: support log messages with line breaks Nov 12, 2021

pohly force-pushed the multi-line-value branch from 709fe0c to 480c8a4 Compare November 12, 2021 14:16

pohly force-pushed the multi-line-value branch from 480c8a4 to 74672eb Compare November 12, 2021 14:20

k8s-ci-robot requested review from serathius and shivanshuraj1333 November 12, 2021 14:22

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Nov 12, 2021

pohly mentioned this pull request Nov 12, 2021

add structured logging helper #269

Closed

serathius reviewed Nov 15, 2021

View reviewed changes

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 15, 2021

serathius reviewed Nov 15, 2021

View reviewed changes

shivanshuraj1333 reviewed Nov 15, 2021

View reviewed changes

pohly force-pushed the multi-line-value branch from 74672eb to 9ca2cd1 Compare November 15, 2021 18:10

pohly mentioned this pull request Nov 15, 2021

structured logging text output: handle arbitrary values kubernetes/kubernetes#106428

Closed

pohly added 3 commits November 19, 2021 13:21

pohly force-pushed the multi-line-value branch from 9ca2cd1 to 954847b Compare November 19, 2021 12:21

pohly mentioned this pull request Nov 26, 2021

structured logging: support values with line breaks #273

Merged

k8s-ci-robot closed this Nov 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

structured logging: support log messages with line breaks #268

structured logging: support log messages with line breaks #268

pohly commented Nov 9, 2021 •

edited

Loading

k8s-ci-robot commented Nov 9, 2021

pohly commented Nov 9, 2021

pohly commented Nov 9, 2021 •

edited

Loading

pohly commented Nov 9, 2021

pohly commented Nov 12, 2021

pohly commented Nov 12, 2021

k8s-ci-robot commented Nov 12, 2021

serathius Nov 15, 2021

pohly Nov 15, 2021

pohly Nov 15, 2021 •

edited

Loading

serathius Nov 15, 2021

pohly Nov 15, 2021

pohly Nov 26, 2021

serathius Nov 15, 2021

pohly Nov 15, 2021

shivanshuraj1333 Nov 15, 2021

pohly Nov 15, 2021

pohly Nov 15, 2021

serathius Nov 16, 2021

pohly Nov 16, 2021 •

edited

Loading

pohly Nov 26, 2021

shivanshuraj1333 Nov 15, 2021

pohly Nov 15, 2021

pohly commented Nov 22, 2021

k8s-ci-robot commented Nov 26, 2021

structured logging: support log messages with line breaks #268

structured logging: support log messages with line breaks #268

Conversation

pohly commented Nov 9, 2021 • edited Loading

k8s-ci-robot commented Nov 9, 2021

pohly commented Nov 9, 2021

pohly commented Nov 9, 2021 • edited Loading

pohly commented Nov 9, 2021

pohly commented Nov 12, 2021

pohly commented Nov 12, 2021

k8s-ci-robot commented Nov 12, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pohly Nov 15, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pohly Nov 16, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pohly commented Nov 22, 2021

k8s-ci-robot commented Nov 26, 2021

pohly commented Nov 9, 2021 •

edited

Loading

pohly commented Nov 9, 2021 •

edited

Loading

pohly Nov 15, 2021 •

edited

Loading

pohly Nov 16, 2021 •

edited

Loading