Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Ascii (and Utf8) encoding #85266

Merged
merged 9 commits into from
May 12, 2023
Original file line number Diff line number Diff line change
Expand Up @@ -1394,6 +1394,18 @@ private static bool VectorContainsNonAsciiChar(Vector128<byte> asciiVector)
}
}

/// <summary>
/// Stores to lower 64bits of <paramref name="byteVector"/> to memory destination of <paramref name="bytePtr"/>[<paramref name="elementOffset"/>]
/// </summary>
/// <remarks>
/// Uses double instead of long to get a single instruction instead of storing temps on general porpose register (or stack)
/// </remarks>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static unsafe void StoreLower(Vector128<byte> byteVector, ref byte bytePtr, nuint elementOffset)
{
Unsafe.WriteUnaligned<double>(ref Unsafe.Add(ref bytePtr, elementOffset), byteVector.AsDouble().ToScalar());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I don't think that comment is needed because it is also expected to be a single instruction on arm. Also I'm not sure double is any better than long here, jit is expected to do the right thing anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can believe that the double hack helps on 32-bit x86. We do not pay as much attention to the codegen quality on 32-bit x86 and there are definitely issues. I do not think it is worth it to be adding workarounds like this for x86 quality issues to CoreLib. If issues like this one are important to fix, it would be better to fix it in the JIT.

Copy link
Member

@EgorBo EgorBo May 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I missed the fact that the same code with long generates bad codegen on x86, I hoped JIT would emit the same as for double 🙁 @jkotas do you mean the whole helper call is not needed? (it is needed because the original code was using Vector64 that we don't recommend using on x86/64) or you're fine with changing this to long and remove notes? (I agree that we'd better fix this in JIT for 32bit we likely already a few similar patterns in BCL)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

(32bit)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkotas do you mean the whole helper call is not needed?

I assume that the helper call is needed to avoid Vector64 that does not produce good code on x64. Without the helper call., the alternative sequence that avoids Vector64 would have to be manually inlined in every place.

}

Daniel-Svensson marked this conversation as resolved.
Show resolved Hide resolved
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static bool VectorContainsNonAsciiChar(Vector128<ushort> utf16Vector)
{
Expand Down Expand Up @@ -1452,6 +1464,7 @@ private static Vector128<byte> ExtractAsciiVector(Vector128<ushort> vectorFirst,
}
}

[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static unsafe nuint NarrowUtf16ToAscii_Intrinsified(char* pUtf16Buffer, byte* pAsciiBuffer, nuint elementCount)
{
// This method contains logic optimized using vector instructions for both x64 and Arm64.
Expand Down Expand Up @@ -1484,7 +1497,7 @@ private static unsafe nuint NarrowUtf16ToAscii_Intrinsified(char* pUtf16Buffer,

ref byte asciiBuffer = ref *pAsciiBuffer;
Vector128<byte> asciiVector = ExtractAsciiVector(utf16VectorFirst, utf16VectorFirst);
asciiVector.GetLower().StoreUnsafe(ref asciiBuffer);
StoreLower(asciiVector, ref asciiBuffer, 0);
nuint currentOffsetInElements = SizeOfVector128 / 2; // we processed 8 elements so far

// We're going to get the best performance when we have aligned writes, so we'll take the
Expand All @@ -1511,7 +1524,7 @@ private static unsafe nuint NarrowUtf16ToAscii_Intrinsified(char* pUtf16Buffer,

// Turn the 8 ASCII chars we just read into 8 ASCII bytes, then copy it to the destination.
asciiVector = ExtractAsciiVector(utf16VectorFirst, utf16VectorFirst);
asciiVector.GetLower().StoreUnsafe(ref asciiBuffer, currentOffsetInElements);
StoreLower(asciiVector, ref asciiBuffer, currentOffsetInElements);
}

// Calculate how many elements we wrote in order to get pAsciiBuffer to its next alignment
Expand Down Expand Up @@ -1564,7 +1577,7 @@ private static unsafe nuint NarrowUtf16ToAscii_Intrinsified(char* pUtf16Buffer,

Debug.Assert(((nuint)pAsciiBuffer + currentOffsetInElements) % sizeof(ulong) == 0, "Destination should be ulong-aligned.");
asciiVector = ExtractAsciiVector(utf16VectorFirst, utf16VectorFirst);
asciiVector.GetLower().StoreUnsafe(ref asciiBuffer, currentOffsetInElements);
StoreLower(asciiVector, ref asciiBuffer, currentOffsetInElements);
currentOffsetInElements += SizeOfVector128 / 2;

goto Finish;
Expand Down