-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remaining ARM Intrinsics #37014
Comments
Tagging subscribers to this area: @tannergooding |
CC. @CarolEidt, @echesakovMSFT |
ExtractNarrowingSaturateUnsignedLower and ExtractNarrowingSaturateUnsignedUpper are needed for the "intrinsification" work that @carlossanlop is doing, so I will implement this next. |
ExtractNarrowingSaturateUnsignedLower and ExtractNarrowingSaturateUnsignedUpper should correspond to sqxtun and sqxtun2 |
label:blocking namespace System.Runtime.Intrinsics.Arm
{
public static class AdvSimd
{
public static unsafe (Vector64<byte> Value1, Vector64<byte> Value2) LoadPairVector64(byte* address);
public static unsafe (Vector64<sbyte> Value1, Vector64<sbyte> Value2) LoadPairVector64(sbyte* address);
public static unsafe (Vector64<short> Value1, Vector64<short> Value2) LoadPairVector64(short* address);
public static unsafe (Vector64<ushort> Value1, Vector64<ushort> Value2) LoadPairVector64(ushort* address);
public static unsafe (Vector64<int> Value1, Vector64<int> Value2) LoadPairVector64(int* address);
public static unsafe (Vector64<uint> Value1, Vector64<uint> Value2) LoadPairVector64(uint* address);
public static unsafe (Vector64<float> Value1, Vector64<float> Value2) LoadPairVector64(float* address);
public static unsafe (Vector128<byte> Value1, Vector128<byte> Value2) LoadPairVector128(byte* address);
public static unsafe (Vector128<sbyte> Value1, Vector128<sbyte> Value2) LoadPairVector128(sbyte* address);
public static unsafe (Vector128<short> Value1, Vector128<short> Value2) LoadPairVector128(short* address);
public static unsafe (Vector128<ushort> Value1, Vector128<ushort> Value2) LoadPairVector128(ushort* address);
public static unsafe (Vector128<int> Value1, Vector128<int> Value2) LoadPairVector128(int* address);
public static unsafe (Vector128<uint> Value1, Vector128<uint> Value2) LoadPairVector128(uint* address);
public static unsafe (Vector128<long> Value1, Vector128<long> Value2) LoadPairVector128(long* address);
public static unsafe (Vector128<ulong> Value1, Vector128<ulong> Value2) LoadPairVector128(ulong* address);
public static unsafe (Vector128<float> Value1, Vector128<float> Value2) LoadPairVector128(float* address);
public static unsafe (Vector64<int> Value1, Vector64<int> Value2) LoadPairScalarVector64(int* address);
public static unsafe (Vector64<uint> Value1, Vector64<uint> Value2) LoadPairScalarVector64(uint* address);
public static unsafe (Vector64<long> Value1, Vector64<long> Value2) LoadPairScalarVector64(long* address);
public static unsafe (Vector64<ulong> Value1, Vector64<ulong> Value2) LoadPairScalarVector64(ulong* address);
public static unsafe (Vector64<float> Value1, Vector64<float> Value2) LoadPairScalarVector64(float* address);
public static unsafe (Vector64<double> Value1, Vector64<double> Value2) LoadPairScalarVector64(double* address);
public static unsafe (Vector64<byte> Value1, Vector64<byte> Value2) LoadPairVector64NonTemporal(byte* address);
public static unsafe (Vector64<sbyte> Value1, Vector64<sbyte> Value2) LoadPairVector64NonTemporal(sbyte* address);
public static unsafe (Vector64<short> Value1, Vector64<short> Value2) LoadPairVector64NonTemporal(short* address);
public static unsafe (Vector64<ushort> Value1, Vector64<ushort> Value2) LoadPairVector64NonTemporal(ushort* address);
public static unsafe (Vector64<int> Value1, Vector64<int> Value2) LoadPairVector64NonTemporal(int* address);
public static unsafe (Vector64<uint> Value1, Vector64<uint> Value2) LoadPairVector64NonTemporal(uint* address);
public static unsafe (Vector64<float> Value1, Vector64<float> Value2) LoadPairVector64NonTemporal(float* address);
public static unsafe (Vector128<byte> Value1, Vector128<byte> Value2) LoadPairVector128NonTemporal(byte* address);
public static unsafe (Vector128<sbyte> Value1, Vector128<sbyte> Value2) LoadPairVector128NonTemporal(sbyte* address);
public static unsafe (Vector128<short> Value1, Vector128<short> Value2) LoadPairVector128NonTemporal(short* address);
public static unsafe (Vector128<ushort> Value1, Vector128<ushort> Value2) LoadPairVector128NonTemporal(ushort* address);
public static unsafe (Vector128<int> Value1, Vector128<int> Value2) LoadPairVector128NonTemporal(int* address);
public static unsafe (Vector128<uint> Value1, Vector128<uint> Value2) LoadPairVector128NonTemporal(uint* address);
public static unsafe (Vector128<long> Value1, Vector128<long> Value2) LoadPairVector128NonTemporal(long* address);
public static unsafe (Vector128<ulong> Value1, Vector128<ulong> Value2) LoadPairVector128NonTemporal(ulong* address);
public static unsafe (Vector128<float> Value1, Vector128<float> Value2) LoadPairVector128NonTemporal(float* address);
public static unsafe (Vector64<int> Value1, Vector64<int> Value2) LoadPairScalarVector64NonTemporal(int* address);
public static unsafe (Vector64<uint> Value1, Vector64<uint> Value2) LoadPairScalarVector64NonTemporal(uint* address);
public static unsafe (Vector64<long> Value1, Vector64<long> Value2) LoadPairScalarVector64NonTemporal(long* address);
public static unsafe (Vector64<ulong> Value1, Vector64<ulong> Value2) LoadPairScalarVector64NonTemporal(ulong* address);
public static unsafe (Vector64<float> Value1, Vector64<float> Value2) LoadPairScalarVector64NonTemporal(float* address);
public static unsafe (Vector64<double> Value1, Vector64<double> Value2) LoadPairScalarVector64NonTemporal(double* address);
public static Vector64<sbyte> ExtractNarrowingSaturateLower(Vector128<short> value);
public static Vector64<short> ExtractNarrowingSaturateLower(Vector128<int> value);
public static Vector64<int> ExtractNarrowingSaturateLower(Vector128<long> value);
public static Vector128<sbyte> ExtractNarrowingSaturateUpper(Vector64<short> lower, Vector128<short> value);
public static Vector128<short> ExtractNarrowingSaturateUpper(Vector64<int> lower, Vector128<int> value);
public static Vector128<int> ExtractNarrowingSaturateUpper(Vector64<long> lower, Vector128<long> value);
public static Vector64<byte> ExtractNarrowingSaturateLower(Vector128<ushort> value);
public static Vector64<ushort> ExtractNarrowingSaturateLower(Vector128<uint> value);
public static Vector64<uint> ExtractNarrowingSaturateLower(Vector128<ulong> value);
public static Vector128<byte> ExtractNarrowingSaturateUpper(Vector64<ushort> lower, Vector128<ushort> value);
public static Vector128<ushort> ExtractNarrowingSaturateUpper(Vector64<uint> lower, Vector128<uint> value);
public static Vector128<uint> ExtractNarrowingSaturateUpper(Vector64<ulong> lower, Vector128<ulong> value);
public static Vector64<byte> ExtractNarrowingSaturateUnsignedLower(Vector128<short> value);
public static Vector64<ushort> ExtractNarrowingSaturateUnsignedLower(Vector128<int> value);
public static Vector64<uint> ExtractNarrowingSaturateUnsignedLower(Vector128<long> value);
public static Vector128<byte> ExtractNarrowingSaturateUnsignedUpper(Vector64<short> lower, Vector128<short> value);
public static Vector128<ushort> ExtractNarrowingSaturateUnsignedUpper(Vector64<int> lower, Vector128<int> value);
public static Vector128<uint> ExtractNarrowingSaturateUnsignedUpper(Vector64<long> lower, Vector128<long> value);
public static Vector64<ushort> ReverseElement8(Vector64<ushort> value);
public static Vector64<short> ReverseElement8(Vector64<short> value);
public static Vector128<ushort> ReverseElement8(Vector128<ushort> value);
public static Vector128<short> ReverseElement8(Vector128<short> value);
public static Vector64<uint> ReverseElement8(Vector64<uint> value);
public static Vector64<int> ReverseElement8(Vector64<int> value);
public static Vector64<float> ReverseElement8(Vector64<float> value);
public static Vector128<uint> ReverseElement8(Vector128<uint> value);
public static Vector128<int> ReverseElement8(Vector128<int> value);
public static Vector128<ulong> ReverseElement8(Vector64<ulong> value);
public static Vector128<long> ReverseElement8(Vector64<long> value);
public static Vector128<ulong> ReverseElement8(Vector128<ulong> value);
public static Vector128<long> ReverseElement8(Vector128<long> value);
public static Vector64<uint> ReverseElement16(Vector64<uint> value);
public static Vector64<int> ReverseElement16(Vector64<int> value);
public static Vector64<float> ReverseElement16(Vector64<float> value);
public static Vector128<uint> ReverseElement16(Vector128<uint> value);
public static Vector128<int> ReverseElement16(Vector128<int> value);
public static Vector128<ulong> ReverseElement16(Vector64<ulong> value);
public static Vector128<long> ReverseElement16(Vector64<long> value);
public static Vector128<ulong> ReverseElement16(Vector128<ulong> value);
public static Vector128<long> ReverseElement16(Vector128<long> value);
public static Vector128<ulong> ReverseElement32(Vector64<ulong> value);
public static Vector128<long> ReverseElement32(Vector64<long> value);
public static Vector128<ulong> ReverseElement32(Vector128<ulong> value);
public static Vector128<long> ReverseElement32(Vector128<long> value);
}
} |
Please correct me if I am wrong but for the first 50 or so methods there is little change for them to fail unless the address is invalid (and they take a pointer), thus I would highly suggest perhaps a |
Such APIs are out of scope for .NET 5 and would require a separate API proposal. However, none of the other intrinsics have such overloads and they would likely not be as performant or have as clear semantics and so I would not be in favor of taking them through API review. |
During the last JIT team meeting there were concerns raised that LoadPairVector64 and LoadPairVector128 are the only intrinsics returning a tuple and we don't know yet if they could expose previously not seen issues. I am going to open a separate issue to track the work of implementing LoadPairVector64 and LoadPairVector128. Depending on the extent of changes their implementation requires we might consider moving these intrinsics to 6.0. Then, work could be consolidated with a work of implementing intrinsics for LD1-LD4,ST1-ST4 operating on multiple registers and thoroughly tested thereafter. cc @dotnet/jit-contrib |
Opened #39243 for LoadPairVector64 and LoadPairVector128 |
The text was updated successfully, but these errors were encountered: