access log: new 20x faster json formatter implementation #35545

wbpcode · 2024-08-01T10:51:04Z

Commit Message: access log: new 20x faster json formatter implementation
Additional Description:

The core idea of the new `JsonFormatter`.

The core idea of the new JsonFormatter is to do as much as possible work when loading configuration. Given a specific json_format:

   json_format:
     text: "text"
     template: "%START_TIME%"
     number: 2
     bool: true
     list:
       - "list_raw_value"
       - false
       - "%EMIT_TIME%"
     nested:
       text: "nested_text"

When loading configuration, the keys, raw values, delimiters (all parts don't contains the substitution format commands) will be sanitized and serialized as JSON string pieces (raw JSON strings) directly.
The strings that contain substitution commands will be extracted as template string and parsed as substitution formatters (objects that could extract dynamic values from request/response/stream info, etc.).
Finally, the whole proto struct configuration we be parsed to an array of raw JSON piece strings and formatters.

Basically, above configuration finally will be parsed and serialized to following pieces and formatters:

- '{"text":"text","tmpl":'                                       # Element 0, raw JSON piece.
- '%START_TIME% %FLAG%'                                          # Element 1, formatter string, must be independent element.
- ',"number":2,"bool":true,"list":["list_raw_value",false,'      # Element 2, raw JSON piece.
- '%EMIT_TIME%'                                                  # Element 3, formatter string, must be independent element.
- '],"nested":{"text":"nested_text"}}'                           # Element 4, raw JSON piece.

Then, at runtime, when a request is complete and the caller want to construct a log line. The above element array will be iterated in order.
The raw JSON piece string will be put into output buffer directly. And the formatters will be evaluated. The outputs of formatters will be sanitized/serialized to legal JSON values and then be put into output buffer.

At the runtime (core data paths), the logic of new JsonFormatter is almost same with the text formatter. By this way, the JsonFormatter could get comparable/same performance with the text formatter.

The performance result of the formatter.

The previous json formatter is too too slow and the new one perform much better (16x-26x faster).

-------------------------------------------------------------------------------
Benchmark                                     Time             CPU   Iterations
-------------------------------------------------------------------------------
BM_AccessLogFormatterSetup                 7020 ns         7020 ns        96714
BM_AccessLogFormatter                       349 ns          349 ns      2008804
BM_StructAccessLogFormatter                5837 ns         5837 ns       117163
BM_TypedStructAccessLogFormatter           4883 ns         4883 ns       148206
BM_LegacyJsonAccessLogFormatter           10460 ns        10460 ns        68180
BM_LegacyTypedJsonAccessLogFormatter       9541 ns         9541 ns        73877
BM_JsonAccessLogFormatter                   460 ns          460 ns      1521227.  <-- here
BM_FormatterCommandParsing                 35.6 ns         35.6 ns     18557843

Risk Level: low. guard by the runtime flag.
Testing: unit.
Docs Changes: n/a.
Release Notes: added.
Platform Specific Features: n/a.

repokitteh-read-only · 2024-08-01T10:51:10Z

CC @envoyproxy/runtime-guard-changes: FYI only for changes made to (source/common/runtime/runtime_features.cc).

🐱

Caused by: #35545 was opened by wbpcode.

see: more, trace.

wbpcode · 2024-08-01T11:00:11Z

/wait

wbpcode · 2024-08-01T18:04:28Z

exception...

jmarantz

I did one quick pass. The big win will be to not use a protobuf as an intermediate format, but purely use the json streamer.

changelogs/current.yaml

source/common/formatter/substitution_formatter.cc

jmarantz

I should also say: this is great work! And even without the suggestions I have made it's a huge improvement, though I'm not entirely clear why the old one was so slow as you didn't really change it in this PR :)

jmarantz

a few more comments.

source/common/formatter/substitution_formatter.h

jmarantz

/wait

source/common/formatter/substitution_formatter.cc

jmarantz

/wait

changelogs/current.yaml

source/common/formatter/substitution_formatter.cc

source/common/formatter/substitution_formatter.h

wbpcode · 2024-08-05T10:12:53Z

Fine, most comments are addressed.

There are two problems:

the substitution_formatter.h is depended by the envoy-mobile. But the sanitize introduce a exception on the core data path.
serializeDubbo method take the Buffer::Instance as output parameter. Maybe we need to change it if we want to use it here.

cc @jmarantz

jmarantz · 2024-08-05T12:59:13Z

/wait

wbpcode · 2024-08-07T06:36:50Z

This PR involves performance, substitution format, Json serializing, etc. and this PR has lots of technical details and background context.

It's not easy to review this PR. Thanks so much that @jmarantz has provided lots of great suggestions that make the PR better, on both the code quality and performance (now the new JsonFormatter get almost same performance with the text formatter, not just comparable, but same.)
Of course, @jmarantz and me don't get agreement on every point. But this's usual thing in OSS community.

Because these are lots of historical comments and responses to the comments. And the PR also changed a lot compared to initial version. So, I cleaned all history context. And lets re-start from here.

I will add some comment about:

The core idea of the new JsonFormatter.
What I did to the exist code and why?
What I added in the PR and why?

Hope these could help to future reviewing and other reviewers.

changelogs/current.yaml

source/common/json/json_serializer.h

jmarantz · 2024-08-08T13:17:10Z

/wait

jmarantz

Sorry I wanted to finish this review off but I feel like we are still not aligned on converging JSON implementations.

source/common/formatter/substitution_formatter.cc

jmarantz · 2024-08-12T12:56:53Z

source/common/formatter/substitution_formatter.h

+  struct JsonString {
+    std::string value_;
+  };
+  struct TmplString {


can we pick a better name than TmplString? TemplateString? Add comments to both of these options indicating what they mean?

Still i'ts not clear from the class names what they do. More descriptive names and comments would help.

jmarantz · 2024-08-12T12:58:19Z

source/common/formatter/substitution_formatter.h

+  struct TmplString {
+    std::string value_;
+  };
+  using FormatElement = absl::variant<JsonString, TmplString>;


perf nit: I think absl::variant might be overkill since both options have the same type underneath, but maybe it optimizes well. But you could obviously do this with a string and bool, or string and enum.

Or if you were really after performance you'd make the first character of the string indicate which variant it is, and use the absl::string_view(str).substr(1,0) to get the rest out.

Great suggestion. Sounds string + bool is better for me.

did you evaluate this?

Thanks for calling it out. There are too much historical context. And I want re-start it, so I close all comment directly without a careful checking. Sorry.

I don't want to re-enter comments I previously entered.

jmarantz · 2024-08-12T13:44:24Z

source/common/formatter/substitution_formatter.h

+   * For example given the following proto struct format configuration:
+   *
+   *   json_format:
+   *     text: "text"


can you use different words in the example here instead of 'text' and 'text' so the rendering is unambiguous? Say, name: "value" ?

still looks the same.

It might be good to leave comments unresolved and let the reviewer resolve them?

jmarantz · 2024-08-12T13:46:04Z

source/common/formatter/substitution_formatter.h

+  void formatValueToFormatElements(const ProtobufWkt::Value& value);
+  void formatValueToFormatElements(const ProtoList& list_value);
+
+  const bool keep_value_type_{};


Can you remove the {} -- this has to be initialized in the ctor based on the ctor args.

still not done

source/common/json/json_serializer.h

wbpcode · 2024-08-13T02:28:14Z

Sorry I wanted to finish this review off but I feel like we are still not aligned on converging JSON implementations.

np. We all want a best solution. I will address your comments after we get a conclusion about the exception usage.

jmarantz · 2024-08-14T18:09:45Z

/wait

I think a separate PR should be cut to sort out the exception issue with the sanitizer.

wbpcode · 2024-09-21T14:14:54Z

/retest

jmarantz · 2024-09-22T21:11:51Z

source/common/formatter/substitution_formatter.h

+// NOTE: This helper class has duplicated logic with the Json::Streamer class but
+// provides lower level of APIs to operate on the output buffer (like control the
+// delimiters). This is designed for special scenario of substitution formatter and
+// is not intended to be used by other parts of the code.


nice, thanks.

zuercher · 2024-09-23T16:24:44Z

source/common/formatter/substitution_formatter.h

+  void addNumber(uint64_t i) { output_buffer_.add(absl::StrCat(i)); }
+  void addNumber(int64_t i) { output_buffer_.add(absl::StrCat(i)); }


Do we need to worry about all JSON numbers being floats (and therefore int64 > ~2^53 are invalid numbers)?

I am not sure, but lots of popular JSON libs will support integer (like the nlohmann/json, rapidjson, etc), so I think this's ok?

https://stackoverflow.com/questions/13502398/json-integers-limit-on-size is of interest; TL;DR the json spec does not describe this limit. It might be that precision is lost if this is read by JavaScript, but a C++ json reader is free to retain the precision. There is an interoperability note in RFC7159 referencing this, but my feeling is we should not clamp to that value, and leave it to the JSON reader to drop precision if needed.

...although putting a comment in here referencing the json spec and RFC about numbers would make sense -- we should also add that to json_streamer.h where all this is sadly duplicated :)

Not sure if it necessary to add additional comments for this.

Although all json numbers are float in definition. But in the actual implementation, most json lib provide the integer support.

Maybe some lib may lost precision when loading the output, but we basically have no better choice anyway?

I agree with your reasoning, and think it would be good to write a comment so that readers are on the same page.

we should also add that to json_streamer.h where all this is sadly duplicated :)

I think we could let the Streamer inherit from here's serializer in the future, like my very initial suggestion. They basically are different levels of abstraction.

comments are done for both streamer and this serializer.

the serializer should definitely not inherit from the formatter. But a common base-class (or even just helper methods) could be factored out.

the serializer should definitely not inherit from the formatter. But a common base-class (or even just helper methods) could be factored out.

we can move the new serializer here out from formatter. I think this doesn't matter. we can give a try at other PR to see if it make sense.

zuercher · 2024-09-23T16:44:04Z

source/common/formatter/substitution_formatter.cc

+    break;
+  case ProtobufWkt::Value::kStringValue: {
+    absl::string_view string_format = value.string_value();
+    if (!absl::StrContains(string_format, '%')) {


Noticed an issue with malformed format specifiers: I think if you see a kStringValue here with value holding a single "%" sign, you'll treat it as a format specifier. The parser will swallow the % and return an empty list of formatters (https://github.com/envoyproxy/envoy/blob/main/source/common/formatter/substitution_formatter.h#L192-L199) and then when generating the JSON output (https://github.com/envoyproxy/envoy/pull/35545/files#diff-55d03d93118484f891da85b3ae673a01955faa49745104921d5128f5031e906aR414) you index into the formatters without an index check.

I'd say this is a bug with SubstitutionFormatParser::parse -- it should probably never return an empty list of formatters.

Wrong link for the index: https://github.com/envoyproxy/envoy/pull/35545/files#diff-55d03d93118484f891da85b3ae673a01955faa49745104921d5128f5031e906aR446

The SubstitutionFormatParser::parse will never return an empty list. There are possible cases for the parse method:

empty format: return a list that contains a plain string provider which will output empty string.

non-emtpy format:
2.1 return a list that contains plain string providers and other providers.
2.2 throw exception to reject the configuration if the format is invalid.

The single % will hit the case 2.2.

But in case some one change the behavior of parse in the future, I can do a minor change to guard that. :)

Update the if check to determine if the formatter list size is 1 or not. Now, it should be safe anyway, even someone change the behavior of parse in the future (although I think this won't happen).

if (formatters.size() != 1) { // 2. Handle the formatter element with multiple or zero providers. stringValueToLogLine(formatters, context, info, serializer); } else { // 3. Handle the formatter element with a single provider and value // type needs to be kept. typedValueToLogLine(formatters, context, info, serializer); }

I'm not sure if this change addresses all of @zuercher 's comments, but I'll leave it to him.

I misread the parsing code a bit, and it doesn't do what I thought. I think handling an empty formatter list is not a bad thing though. Thanks for adding the guard.

Signed-off-by: wangbaiping <[email protected]>

wbpcode · 2024-09-24T08:55:04Z

/retest

Signed-off-by: wangbaiping <[email protected]>

jmarantz · 2024-09-25T15:10:08Z

source/common/formatter/substitution_formatter.h

+  /**
+   * Serializes a integer number.
+   * NOTE: All numbers in JSON is float. When loading output of this serializer, the parser's
+   * implementation decides if the full precision of big integer could be preserved or not.


can you add the ref to the stackoverflow article & rfc?

…pt-json-formatter

Signed-off-by: wangbaiping <[email protected]>

…pt-json-formatter

wbpcode · 2024-09-26T06:50:00Z

/retest

zuercher

lgtm

fxposter · 2024-10-25T10:01:29Z

@wbpcode am I correct that new implementation in 1.32 ignores omit_empty_values: true?

wbpcode · 2024-10-25T10:28:46Z

@wbpcode am I correct that new implementation in 1.32 ignores omit_empty_values: true?

it will ignore the sort_properties, but will obey the omit_empty_value

And the new implementation only be enabled by default after 1.32. :)

fxposter · 2024-10-25T10:30:19Z

I have just tested it in our custom build of 1.32 and I see null-s with this new implementation, while we have omit_empty_values: true set. That's why I asked...

wbpcode · 2024-10-25T10:34:13Z

I have just tested it in our custom build of 1.32 and I see null-s with this new implementation, while we have omit_empty_values: true set. That's why I asked...

I will check it this weekend. Maybe there is a problem we didn't notice.

fxposter · 2024-10-25T10:36:02Z

Thank you!

wbpcode · 2024-10-25T11:06:02Z

Thank you!

@fxposter I think I know why. We do almost every thing at configuration loading time, and only need to do the substitution command evaluation and string joining at runtime.

This way bring great performance for the Json formatter. But also make it impossible to eliminate the null entry. Because the key has been serialized at configuration loading time.

The only way to resolve it is keep the structural tree of configuration and iterate the tree at runtime.

It's no doubt that will penalize performance.

wbpcode · 2024-10-25T11:08:05Z

Is this a block point for you? I can try to find some time to test how much performance degradation that the fix will bring.

If the degradation is acceptable, we can try to fix it.

fxposter · 2024-10-25T11:19:50Z

I don't think it's problematic for us in a way that we won't be able to adjust the log collector code, it just no longer matches the documentation.
Also, in reality keys are only added when you serialize a struct, so if you do one additional check for the value to be null and if that's the case and omit_empty_values is enabled - I think you simply can skip adding that key, separator and format value in the first place. So it will require single additional if, no? (I just skimmed over the code, so I may be wrong here, but it looks that it should work.)

fxposter · 2024-10-25T11:20:38Z

Also, probably not adding unnecessary strings and writing them to file can have benefits by itself.

wbpcode · 2024-10-25T11:32:55Z

It's more complex because the serialization of the keys is done when we loading the configuration, and we can never know if the value will be null or not at that time.

I will create a bug first to record this problem.

arazdolski · 2025-01-07T12:26:53Z

@wbpcode Did you create a bug?

wbpcode · 2025-01-07T13:23:30Z

@wbpcode Did you create a bug?

nope. I record it in my todo but forget to create a bug

Hexta · 2025-01-08T15:07:53Z

@wbpcode Would you mind creating a bug?
The next Envoy release will have this formatter enabled by default, and it could cause logs parsing problems for users.

wbpcode · 2025-01-09T04:34:16Z

@wbpcode Would you mind creating a bug?

The next Envoy release will have this formatter enabled by default, and it could cause logs parsing problems for users.

ofcourse not. I will create one when I could access my computer.

But would you mind give more context about the parsing problem? The reason why the bug has no priority is because I think it's more like a minor behavior change and won't actually effect the users.

wbpcode assigned jmarantz Aug 1, 2024

repokitteh-read-only bot added waiting and removed waiting labels Aug 1, 2024

jmarantz reviewed Aug 2, 2024

View reviewed changes

repokitteh-read-only bot added the waiting label Aug 2, 2024

jmarantz reviewed Aug 2, 2024

View reviewed changes

source/common/formatter/substitution_formatter.cc Outdated Show resolved Hide resolved

source/common/formatter/substitution_formatter.cc Outdated Show resolved Hide resolved

jmarantz reviewed Aug 5, 2024

View reviewed changes

repokitteh-read-only bot removed the waiting label Aug 5, 2024

repokitteh-read-only bot added waiting and removed waiting labels Aug 5, 2024

wbpcode linked an issue Aug 7, 2024 that may be closed by this pull request

improve performance of json formatter #35501

Closed

wbpcode mentioned this pull request Aug 7, 2024

make the Json::sanitize method exception free #35598

Closed

This comment was marked as duplicate.

Sign in to view

jmarantz reviewed Aug 8, 2024

View reviewed changes

changelogs/current.yaml Show resolved Hide resolved

source/common/json/json_serializer.h Outdated Show resolved Hide resolved

repokitteh-read-only bot added waiting and removed waiting labels Aug 8, 2024

jmarantz reviewed Aug 12, 2024

View reviewed changes

repokitteh-read-only bot added the waiting label Aug 14, 2024

jmarantz reviewed Sep 22, 2024

View reviewed changes

zuercher reviewed Sep 23, 2024

View reviewed changes

address comments

c0840e1

Signed-off-by: wangbaiping <[email protected]>

jmarantz mentioned this pull request Sep 25, 2024

json log with sort_properties prints decimals for all numbers #34420

Closed

add comment to serializing integer

3093c7a

Signed-off-by: wangbaiping <[email protected]>

jmarantz reviewed Sep 25, 2024

View reviewed changes

wangbaiping added 3 commits September 25, 2024 23:45

Merge branch 'main' of https://github.com/envoyproxy/envoy into dev-o…

05cc47f

…pt-json-formatter

address comments

0f37963

Signed-off-by: wangbaiping <[email protected]>

Merge branch 'main' of https://github.com/envoyproxy/envoy into dev-o…

b824fe1

…pt-json-formatter

zuercher approved these changes Sep 26, 2024

View reviewed changes

wbpcode merged commit b7cad39 into envoyproxy:main Sep 26, 2024
21 checks passed

wbpcode deleted the dev-opt-json-formatter branch September 26, 2024 23:43

wbpcode mentioned this pull request Jan 9, 2025

log: omit_empty_values not works correctly for new json formatter #37941

Open

		void addNumber(uint64_t i) { output_buffer_.add(absl::StrCat(i)); }
		void addNumber(int64_t i) { output_buffer_.add(absl::StrCat(i)); }

access log: new 20x faster json formatter implementation #35545

access log: new 20x faster json formatter implementation #35545

Conversation

wbpcode commented Aug 1, 2024 • edited Loading

The core idea of the new JsonFormatter.

The performance result of the formatter.

repokitteh-read-only bot commented Aug 1, 2024

wbpcode commented Aug 1, 2024

wbpcode commented Aug 1, 2024

jmarantz left a comment

Choose a reason for hiding this comment

jmarantz left a comment

Choose a reason for hiding this comment

jmarantz left a comment

Choose a reason for hiding this comment

jmarantz left a comment

Choose a reason for hiding this comment

jmarantz left a comment

Choose a reason for hiding this comment

wbpcode commented Aug 5, 2024 • edited Loading

jmarantz commented Aug 5, 2024

wbpcode commented Aug 7, 2024 • edited Loading

This comment was marked as duplicate.

jmarantz commented Aug 8, 2024

jmarantz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wbpcode commented Aug 13, 2024

jmarantz commented Aug 14, 2024

wbpcode commented Sep 21, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wbpcode Sep 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wbpcode Sep 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wbpcode commented Sep 24, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wbpcode commented Sep 26, 2024

zuercher left a comment

Choose a reason for hiding this comment

fxposter commented Oct 25, 2024

wbpcode commented Oct 25, 2024

fxposter commented Oct 25, 2024

wbpcode commented Oct 25, 2024

fxposter commented Oct 25, 2024

wbpcode commented Oct 25, 2024 • edited Loading

wbpcode commented Oct 25, 2024

fxposter commented Oct 25, 2024

fxposter commented Oct 25, 2024

wbpcode commented Oct 25, 2024 • edited Loading

arazdolski commented Jan 7, 2025

wbpcode commented Jan 7, 2025

wbpcode commented Aug 1, 2024 •

edited

Loading

The core idea of the new `JsonFormatter`.

wbpcode commented Aug 5, 2024 •

edited

Loading

wbpcode commented Aug 7, 2024 •

edited

Loading

wbpcode Sep 25, 2024 •

edited

Loading

wbpcode Sep 24, 2024 •

edited

Loading

wbpcode commented Oct 25, 2024 •

edited

Loading

wbpcode commented Oct 25, 2024 •

edited

Loading