-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[API Proposal]: Ascii.ToUtf16 overload that treats \0
as invalid
#80366
Comments
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
Does the ASP.NET code measurably regress if the newly-added ToUtf16 is used plus a call to IndexOf((byte)'\0) to validate there wasn't a null? private static unsafe void GetHeaderName(ReadOnlySpan<byte> source, Span<char> buffer)
{
OperationStatus status = Ascii.ToUtf16(source, buffer, out _, out _);
if (status != OperationStatus.Done || source.IndexOf((byte)'\0') >= 0)
{
KestrelBadHttpRequestException.Throw(RequestRejectionReason.InvalidCharactersInHeaderName);
}
} @GrabYourPitchforks had some fairly strong opinions about special-casing |
In local micro-benchmarks yes, mainly due to the It would be more interesting to see real-usage benchmarks, like how that would impact Techempower, etc. But unfortunately I don't know how to run such benchmarks (at the moment).
I'm looking forward to read them. |
Tagging subscribers to this area: @dotnet/area-system-buffers Issue DetailsBackground and motivationFor ASP.NET Core's API Proposalnamespace System.Buffers.Text
{
public static class Ascii
{
// existing methods
+ public static OperationStatus ToUtf16(ReadOnlySpan<byte> source, Span<char> destination, out int bytesConsumed, out int charsWritten, bool treatNullAsInvalid = false);
}
} The new ASCII-APIs will get added to .NET 8, so w/o breaking change an optional argument could be added. namespace System.Buffers.Text
{
public static class Ascii
{
// existing methods
- public static OperationStatus ToUtf16(ReadOnlySpan<byte> source, Span<char> destination, out int bytesConsumed, out int charsWritten);
+ public static OperationStatus ToUtf16(ReadOnlySpan<byte> source, Span<char> destination, out int bytesConsumed, out int charsWritten, bool treatNullAsInvalid = false);
}
} API Usage private static unsafe void GetHeaderName(ReadOnlySpan<byte> source, Span<char> buffer)
{
OperationStatus status = Ascii.ToUtf16(source, buffer, out _, out _, treatNullAsInvalid: true);
if (status != OperationStatus.Done)
{
KestrelBadHttpRequestException.Throw(RequestRejectionReason.InvalidCharactersInHeaderName);
}
} Alternative DesignsNo response RisksThe value for Besides treating
|
99% of the time there won't be any nulls but headers average 800 bytes to 2kB with cookies; so scanning the headers an additional time to check for nulls can be significant |
Thanks for these numbers! |
Googled average headers size 😅 As an anecdote going to Google homepage logged in my headers are 2158 bytes and it makes 27 requests to that domain (26 to other domains); so in total 58kB for that page and one domain |
Wouldn't it be better to treat For example, I think that when parsing HTTP 1 headers, you could look for |
Assigning to @GrabYourPitchforks for now until the necessary input can be given. |
I am strongly against this proposal. ASCII is defined as characters in the range Since this is protocol-specific for aspnet, I recommend the code remain in that project. |
@sebastienros Is there a way to measure real-world impact for this API? I've laid out above my arguments against doing this - namely, that protocol-level concerns don't belong in a general-purpose API. But if there is strongly compelling evidence that this is a real perf bottleneck and the runtime layer is the only layer that can provide this functionality properly, that should be weighed in favor of this API, even against my concerns. |
I will need to check what can be impacted in ASP.NET and see what benchmarks would exercise this code path. If someone knows which scenarios are useful here then I can start it. |
Do we know why ASP.NET special-cases \0 here? What happens if we just stop doing that? If ASP.NET needs to do that, is it likely that others will similarly need to special-case certain values? I'd really like to be able to encapsulate this in a core library provided helper, for ASP.NET to use and for others to use. Vectorizing such a thing is very non-trivial. Is there a shape of an API we could come up with that would enable efficiently doing this, e.g. a default overload that is for [0, 127] but another overload that lets you opt-out one or more values or ranges, or some such thing? Note that one of the primary uses of IndexOfAnyValues is for use in protocols, where protocols need to search for or exempt certain things. Could/should we incorporate that somehow? |
Spec wise it shouldn't; however if its a front-end server that passes requests to another server; if and that server uses null terminated strings then the request can change in the second layer accessing url's which weren't mapped to the internet, which could be a security risk (along lines of https://en.wikipedia.org/wiki/HTTP_request_smuggling though different) |
As @svick says, it just needs to check for 3 rather than 2, then throw if its
|
I hadn't seen @svick's comment:
Is that viable? Can all of the places that call into this shared routine be updated trivially to ensure the data passed in doesn't contain a \0? If so, let's do that, add the AScii.ToUtf16 that's for the whole [0, 127] range, update ASP.NET to use that, and call it a good day. |
@BrennanConroy, do you have a suggestion for how we could make forward progress on this? |
Looks like it would be "easy" to do this. If we updated https://github.com/dotnet/aspnetcore/blob/f62f12357c49c4f1cca502e8f4cf57353f0b320f/src/Servers/Kestrel/Core/src/Internal/Http/HttpParser.cs#LL64C37-L64C37 to instead be
This is where we start parsing the HTTP/1 request so we're already going through the entire request line looking for The concerns are:
|
Thanks, I sketched it out in: A bunch of tests failed, and I haven't gone through to see which would be expected (the tests have internals access) and which would be real problems. Any interest in picking it up and seeing how far we can run with it? |
Yeah, I'll pick it up and see what the team thinks |
It looks like #80245 is open which might be indirectly tracking part of the work to improve the performance. |
I'm not aware of any fundamental reason that should be the case. We should fix anything in the core routine that might be contributing those few additional cycles. I'd hope it's not just the difference between returning a bool and returning an OperationStatus. |
Grabbed the assembly of One very obvious difference is that the core processing is not inlined in the But if someone wants to take a look at the assembly in the meantime and see if there is anything obviously worse in the Ascii.ToUtf16
TryGetAsciiString
|
Below are the results I'm getting on my machine. This seems very much within the range of noise. @BrennanConroy Are you seeing different results than below?
|
@BrennanConroy, can you comment on the above? |
Closing given dotnet/aspnetcore#56578 |
Background and motivation
For ASP.NET Core's
StringUtilities
the ASCII values of the range(0x00, 0x80)
are considered valid, whilstAscii.ToUtf16
treats the whole ASCII range[0x00, 0x80)
as valid. In order to baseStringUtilities
on the Ascii-APIs and avoid custom vectorized code in ASP.NET Core internals\0
should be allowed to be treated as invalid. See dotnet/aspnetcore#45962 for further info.API Proposal
namespace System.Buffers.Text { public static class Ascii { // existing methods + public static OperationStatus ToUtf16(ReadOnlySpan<byte> source, Span<char> destination, out int bytesConsumed, out int charsWritten, bool treatNullAsInvalid = false); } }
The new ASCII-APIs will get added to .NET 8, so w/o breaking change an optional argument could be added.
API Usage
Alternative Designs
No response
Risks
The value for
treatNullAsInvalid
will be given as constant, so the JIT should be able to dead-code eliminate any code needed for "default case" (whole ASCII-range incl.\0
), so no perf-regression should be expected.Besides treating
\0
as special value which is optinally treated as invalid I don't expect any other value to be considered special enough for optional exclusion.The text was updated successfully, but these errors were encountered: