-
Notifications
You must be signed in to change notification settings - Fork 4.9k
Avoid MemoryMarshal.Cast when transcoding from UTF-16 to UTF-8 while escaping in Utf8JsonWriter. #40996
Avoid MemoryMarshal.Cast when transcoding from UTF-16 to UTF-8 while escaping in Utf8JsonWriter. #40996
Conversation
escaping in Utf8JsonWriter.
fixed (char* ptr = value) | ||
{ | ||
idx = encoder.FindFirstCharacterToEncode(ptr, value.Length); | ||
} | ||
goto Return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For v5 we may want to add a overload of FindFirstCharacterToEncode(ReadOnlySpan<char)
to S.T.Encoding.Web so consumers don't have to use unsafe pinning code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to address the null check, otherwise LGTM.
idx = encoder.FindFirstCharacterToEncodeUtf8(MemoryMarshal.Cast<char, byte>(value)); | ||
fixed (char* ptr = value) | ||
{ | ||
idx = encoder.FindFirstCharacterToEncode(ptr, value.Length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some implementations of the FindFirstCharacterToEncode
method may not accept null pointers. We should special-case value.IsEmpty
at the beginning of this method and bail early.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix this and add a buggy javascriptencoder implementation as a test.
|
||
using (var writer = new Utf8JsonWriter(output)) | ||
{ | ||
writer.WriteStringValue("\u6D4B\u8A6611"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
U+6D4B
and U+8A66
might be allowed through unescaped in a future version, which could break this unit test. If you're looking for something stable that's highly unlikely to ever be allowed through unescaped, consider something from the range U+E000..U+F8FF
(inclusive). That block is permanently reserved for private use and I highly doubt even the "relaxed" escaper will ever allow those to pass through unescaped.
If the test doesn't rely on this being output escaped or unescaped, you're good to go. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be allowed through unescaped in a future version
We have a few writer tests that ensure the default behavior escapes certain characters. If we change that behavior, we would have to/should change those tests as well, so I would prefer the tests break at that time.
might not handle null ptrs correctly.
MacOS Build x64_Debug test failures are unrelated (same as #40997 (comment)): System.Security.Cryptography.OpenSsl.Tests on netcoreapp-OSX-Debug-x64-OSX.1014.Amd64.Open System.Security.Cryptography.OpenSsl.Tests Total: 649, Errors: 0, Failed: 565, Skipped: 14, Time: 1.072s |
…escaping in Utf8JsonWriter. (dotnet/corefx#40996) * Avoid MemoryMarshal.Cast when transcoding from UTF-16 to UTF-8 while escaping in Utf8JsonWriter. * Fix white space typo in the test expected string. * Guard against empty spans where an implementation of JavascriptEncoder might not handle null ptrs correctly. * Cleanup tests to avoid some duplication. * Some more test clean up. Commit migrated from dotnet/corefx@ee9995f
Fixes https://github.com/dotnet/corefx/issues/40979 in master.
This is meant to be a targeted fix to be ported to 3.0.
cc @steveharter, @GrabYourPitchforks, @pranavkm, @ericstj