-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API Proposal : Arm TableVectorLookup and TableVectorExtension intrinsics #1277
Comments
ValueTuple with single element sounds odd, why not use the vector itself? |
No particular reason other than uniformity of syntax and because the instruction itself expects a list there. But I concede that C# doesn't have very good support for single element tuples so I have no objections to changing it. @CarolEidt I have been wondering about the register allocation for this though. If I understand correctly the RA works just by looking at the type right? so It doesn't know which function it's doing the allocations for? In which case maybe we should have a new type rather than using Tuples so the constraint for sequential registers don't get applied to all tuples? Since that would created unneeded register pressure in a lot of code and make some un-allocatable. (e.g. a 30 element tuple for instance) |
namespace System.Runtime.Intrinsics.Arm
{
public partial class AdvSimd
{
public static Vector64<byte> VectorTableLookup(Vector128<byte> table, Vector64<byte> byteIndexes);
public static Vector64<sbyte> VectorTableLookup(Vector128<sbyte> table, Vector64<sbyte> byteIndexes);
public static Vector64<byte> VectorTableLookupExtension(Vector64<byte> defaultValues, Vector128<byte> table, Vector64<byte> byteIndexes);
public static Vector64<sbyte> VectorTableLookupExtension(Vector64<byte> defaultValues, Vector128<sbyte> table, Vector64<sbyte> byteIndexes);
// public partial class Arm32
// {
// public static Vector64<byte> VectorTableLookup(Vector64<byte> table, Vector64<byte> byteIndexes);
// public static Vector64<sbyte> VectorTableLookup(Vector64<sbyte> table, Vector64<sbyte> byteIndexes);
//
// public static Vector64<byte> VectorTableLookupExtension(Vector64<byte> defaultValues, Vector64<byte> table, Vector64<byte> byteIndexes);
// public static Vector64<sbyte> VectorTableLookupExtension(Vector64<byte> defaultValues, Vector64<sbyte> table, Vector64<sbyte> byteIndexes);
// }
public partial class Arm64
{
public static Vector128<byte> VectorTableLookup(Vector128<byte> table, Vector128<byte> byteIndexes);
public static Vector128<sbyte> VectorTableLookup(Vector128<sbyte> table, Vector128<sbyte> byteIndexes);
public static Vector128<byte> VectorTableLookupExtension(Vector128<byte> defaultValues, Vector128<byte> table, Vector128<byte> byteIndexes);
public static Vector128<sbyte> VectorTableLookupExtension(Vector128<byte> defaultValues, Vector128<sbyte> table, Vector128<sbyte> byteIndexes);
}
}
} |
@tannergooding @echesakovMSFT Could use some advice on how to best handle these The problem is that we have the same API name with the same elements (they're all 8-bit element vectors) but that vary in essentially the function arity. This means I need to use the same instruction name I'm currently thinking I can do this by having some new
Alternatively I can have just 1 new one The problem I have here is that I am not able to distinguish between the instructions then in placed where I don't have access to the value. e.g. not sure how I need to handle Alternatively I can re-use an existing fmt or maybe something else entirely? |
@TamarChristinaArm You should use approach I've taken in #33461 where I added suffixes BTW, I added some helper functions for displaying consecutive register lists so you can use them when implementing |
@echesakovMSFT I saw that but it wasn't entirely clear to me why you choose that approach. I didn't see any usage of the Wouldn't it be better to have the same instruction name but overload on the |
@TamarChristinaArm For these intrinsics we can use SpecialCodeGen - the decision on what instruction to emit will be done in codegen not emitter in this case. I don't see this as a big issue - there is a handful of intrinsics that support multiple regs - ld[1-4],st[1-4],tbl and tbx. We also don't know the final API surface even to say now if it's going to be needed. For example, if we had
It's an option - but wouldn't this require defining multiple formats LS_2D[2-4]regs, LS_3F[2-4]regs, LS_2E[2-4]regs - since ld1 and st1 can be one of those? You would also need to pass information in emitIns_R_R_I, emitIns_R_R_R that an instruction uses multiple registers. |
Yes but there aren't a lot of instructions with this characteristics though.
Yes, though I see your point now. I had thought I could extract this information from the So it's a bit unfortunately but it does look like I'll need to just have 4 different instructions :( |
I agree - that there could be more elegant solution to this. I explored other options while implementing ld[1-4], st[1-4] - such as encoding of the number of registers in a register list via emitAttr or emitInsOpts. However, the related changes would be much more disruptive/riskier and also far from ideal so I decided to pick the least worst choice i.e. encoding the information via the instruction opcode. |
@echesakovMSFT I've been trying to build my changes all morning but they keep failing with when I just do Do you know what's going on? |
ah, nvm, I see that the real error was hidden much much higher up.. |
Should this be closed now? Does that PR implement them? |
No, it only implements the API approved parts. i.e. the single register versions. The multi-register version aren't implemented as they require changes to the register allocator. |
Moving to the future since the multiple register variants of tbl and tbx are out of scope for 5.0 |
We write to stderr from CoreLib for two reasons: 1. FailFast 2. Unhandled exception (also FailFast in a way) We're using a roundabout way to use `Console.Error.WriteLine` but that has problems for FailFast because we then don't fail fast enough (dotnet#1277). This inlines enough of writing to stderr that we can get rid of System.Console dependency. Fixes dotnet#1277.
Closing this as the single-register variants have been implemented and the multiple-register variants are tracked by #81599 |
The instructions
TBL
andTBX
do a table lookup using a vector of indices and a "table" which is created by concatenating all the input vectors together.The difference between a
TBL
and aTBX
is that aTBL
will return0
for any indices that are out of range andTBX
is a destructive variant which will leave the corresponding value to the out of range index untouched.I've modelled them using tuples for the register list as the instructions have an explicit requirement that the input vectors must use consecutive registers.
This fits in the general design of https://github.com/dotnet/corefx/issues/26574
/cc @tannergooding @CarolEidt @echesakovMSFT
The text was updated successfully, but these errors were encountered: