-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make the Json::sanitize method exception free #35598
Comments
cc @jmarantz Considering the fact that the Json::sanitize is only used by the |
When I wrote the sanitizer I originally did not use (or even know about) nlohmann, and I had a lot of manual code to do the same thing, which was 2x faster, and exception free. See #20428. @htuch argued not to add that code to the codebase given the existing dependence on nlohmann, and I agreed that the speedup was not worth the code complexity. And the usage of the sanitizer at the time was all in the main thread, in the admin stats endpoint, where we did allow exceptions (compiled out for mobile). I think if we want to be exception free I think we should revive #20428. |
Why can't we just do what @wbpcode suggests and use an error handler? I'm not sure I see the need to import a 1k+ code doing non-trivial sanitization if we don't have to. |
Maybe -- @wbpcode what did you mean by "this will break existing behavior". I think I keyed on that warning. |
Also is it enough to avoid nlohmann throwing exceptions at runtime? Or do we need to compile all code linked into E-M with exceptions disabled? |
In current implemention, if the string contains invalid utf-8, then all non-ascii chars will be replaced with \xxx, the xxx is the number value of byte. See
But if we use the way I suggested, only the invalid utf-8 will be replaced with U+FFFD I personally think this is fine and more correct and this method is used in limited scope for now.**** |
Why is u+fffd more correct? I am not sure if it's that important that we escape invalid utf8 the way it was done, but is it enough to avoid nlohmann throwing exceptions at runtime? Or do we need to compile all code linked into E-M with exceptions disabled? |
Sorry, I didn't make it clear. Not the u+fffd is more correct. But the way to handle the other legal utf-8 in a string that contains invalid utf-8 is more correct. For a string that contains invalid utf-8, the Nlohmann json at least will handle other legal utf-8 correctly. |
From source code and doc, yeah. Fuzz test could be used to ensure it.
Yeah. I think alyssawilk has done lots (super lots) of work to ensure this point. @alyssawilk |
+1 to the suggested error handler. I think this is a non-problem for E-M #35607 but represents a potential config validation issue for envoy core (so is worth fixing) |
Ensure the runtime exception issue again: dump():
By setting the And I also do a quick check to the source code. constructor: Depend on the Specific to constructing json object from string view, only the allocator may throw exception which we basically will ignore. There is the core code snippet:
|
I have split the escaping check to a separated function and them add a new utility method
The utility method will only be used by the new JsonFormatter which requires that the users to enable it explicitly by setting runtime flag. See the latest commit in the #35545. Will this way may help to eliminate some worries about the runtime exception? cc @jmarantz We can make this method to be the only one implementation after it is battle tested. For example, after the JsonFormatter completely replaced the legacy JsonFormatter. |
I guess I'm still a little opposed to bringing in the extra complexity for marginal benefit. Do we have a performance problem today? The 20x is a microbenchmark which will not show up in practice (unless I'm missing some context). Meanwhile, we permanently live with our new more complicated JSON formatter and sanitizer. Why not just replace the invalid characters and move on? This feels like a non-problem. |
I don't know if I have gotten a specific answer to this question, which I think is for @alyssawilk . I think @wbpcode is proposing to add the json sanitizer as a dependency in e-m which I think it was not before. json sanitizer currently links in nlohmann which includes 'throw' in its compiled code. I understand the strategy is that we would prevent nlohmann from calling throw, dynamically. However I thought it might still be a problem to link in a dependency that compiles in a reference to |
@alyssawilk you said "potential config validation issue for envoy core (so is worth fixing)". I did not undestand that phrase. Can you explain? |
Sorry, I think maybe I don't make it clear enough. The core of #35545 is a new JsonFormatter implemention for access logging, with completely new design. It's 20x faster (and I think the code is also much simpler) than our legacy one and provide same performance with our text formatter. The poor performance of legacy JsonFormatter actually effects our users (see #35501, an user report similar problem and profiling result in that issue). An exception free sanitize/escape is necessary for #35545 (our new JsonFormatter) because the new JsonFormatter will be used at core data paths and will be linked to e-m. My solution now is creat a new exception-free sanitize/escape method by replacing the invalid utf-8 (as my inital suggestion in this issue) for our new JsonFormatter usage. The new method only bring 4 additional lines code and is much simpler than previous method because we needn't to handle exception. I don't think it will bring extract complexity. So, I think you may misunderstood something (sorry for my previous unclear description.) Cc @htuch |
the context here is that @wbpcode is adding a new dependence on JsonSanitizer that requires it to be linked in and functional in envoy-mobile. Hence he is trying to remove the exception flow. The benefit of this is in #35501 which is looking for a 20x improvement in json serialization during access loggers. I'm very much aligned with this -- I was able to get a huge improvement in json stats output before because the default model of populating a protobuf and then serializing that is a bag full of slow. We are iterating on some of the details of course. I'm assuming this is necessary because json access loggers are needed in envoy-mobile; if they are not then maybe this change can be skipped. |
ah. so I think adding more exceptions to envoy is a poor choice given we're trying to move away from them due to the number of times we've found configs and queries of death where they were used improperly. |
OK, @wbpcode I think I see the reasonableness of the optimization in this use case, thanks for sharing. My vote is for whatever keeps exceptions from occurring (less opinionated about compile time, but sure) and whatever is simplest in implementation that achieves the end goal. The one concern I have in all this code is it's incredibly fragile and easy to get wrong (enforcing sanitization, parsing, ensuring schema or valid JSON syntax on the way out), mistakes are security bugs in many cases, even for access logs. |
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions. |
This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions. |
Actually this was fixed in #36000 |
This is necessary for core data path usage and envoy-mobile.
The
Json::sanitize
may call theNlohmann::Factory::serialize
method to sanitize string that contains special character. And theNlohmann::Factory::serialize
will construct a json object and call its dump method.Basically, the constructor won't throw because it won't validate the input string. But the dump() will throw by default if the input string contains invalid UTF8 characters.
This exception could be suppressed by setting the
error_handler
to thereplace
. Then the invalid UTF-8 sequences will be replaced by U+FFFD.This will break exist behavior, but is a quick way to remove exception from the
Json::sanitize
.The text was updated successfully, but these errors were encountered: