-
Notifications
You must be signed in to change notification settings - Fork 4.9k
Remove temporary ByteArrayHelpers and use Span.Equals #27654
Conversation
{ | ||
// TODO #21395: | ||
// Replace with the MemoryExtensions implementation of Equals once it's available | ||
internal static bool EqualsOrdinalAsciiIgnoreCase(string left, ReadOnlySpan<byte> right) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any perf concerns with using the ReadOnlySpan<char>
overload available now that will have the "IsAscii" check per iteration which is unnecessary in this case - https://github.com/dotnet/coreclr/blob/47c39edc2fbfbfbddca13ee1a47699ecaaaee204/src/mscorlib/shared/System/Globalization/CompareInfo.cs#L571?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how the changes are valid. Casting a span of ASCII bytes to a span of chars will misinterpret them. Does this pass our tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, you are definitely right. Here, two bytes would be considered one character rather than individual characters. Tests don't pass. We have to pad the byte span to turn it into a char span.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'd either need to Widen the byte
s to short
/char
and compare
https://github.com/aspnet/KestrelHttpServer/blob/300453396a57cd14bcb297627b1507de83dc88ab/src/Kestrel.Core/Internal/Infrastructure/StringUtilities.cs#L12-L109
Or Narrow the string
char
s to byte
s; which is probably dodgier (as would be discarding high byte)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can vectorize the casing with something like (not a very good version)
public static void LowerCaseSIMD(byte[] data)
{
var A = new Vector<byte>(65); // A
var Z = new Vector<byte>(90); // Z
for (var o = 0; o < data.Length - Vector<byte>.Count; o += Vector<byte>.Count)
{
var v = new Vector<byte>(data, o);
v = Vector.ConditionalSelect(
Vector.BitwiseAnd(
Vector.GreaterThanOrEqual(v, A),
Vector.LessThanOrEqual(v, Z)
),
Vector.BitwiseOr(new Vector<byte>(0x20), v), // 0010 0000
v
);
// Now Vector.Widen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to see if there was some char to byte conversion happening up-stack, but the data is coming in as bytes... http://source.dot.net/#System.Net.Http/System/Net/Http/Managed/HttpConnection.cs,354
If we "Widen", won't we need to allocate a 2x temporary buffer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we "Widen", won't we need to allocate a 2x temporary buffer?
For the the vector part its two local Vector<short>
variables; for shorter a long
, int
or char
; no buffers needed
As an aside, @stephentoub, this leftover TODO can be cleaned up now, correct? This issue - dotnet/roslyn#17287 - got resolved. |
The better approach here (at least for TryGetKnownHeader) is to precalculate the header names as We still need to do a case-insensitive compare here, so we can't just use SequenceEqual. But perhaps we could do some clever vectorization as per @benaadams suggestion above. And we can guarantee the precalculated header name bytes are all lowercase already. Or we could do something slightly different: Header names usually use consistent casing (i.e. Content-Length, not CONTENT-LENGTH) so we could do a SequenceEqual on the expected casing first, which will typically succeed, and fall back to a more expensive case-insensitive compare if that fails. |
Store a |
Yeah, exactly. GitHub ate the |
We do; that's what @geoffkizer meant when he said "We actually sorta have this precalculation today, we're just not using it". If we have a case-insensitive comparison that works on bytes, then we can use that; otherwise let's just stick with what we have. |
No. It just needs a different issue number now: |
Actually, it was already updated; the code you linked to is stale. |
@ahsonkhan, at this point is there more to do here other than just delete the TODO comment? |
Yes, if we decide to keep the internal ByteArrayHelpers method. In which case, I think we can close this PR. |
If we find a good way to avoid it in the future, I'd be happy to see it go away. In the meantime, though, I don't think we have anything better / shared we can use instead. |
Thanks, though! |
The issue - https://github.com/dotnet/corefx/issues/21395 - has been resolved.
cc @stephentoub, @geoffkizer