-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add BSF and BSR fallbacks for BitOperations methods #34550
Conversation
I probably won't get to reviewing this before Monday, but assigning myself so I don't forget 😄 |
src/libraries/System.Private.CoreLib/src/System/Numerics/BitOperations.cs
Outdated
Show resolved
Hide resolved
Looks like something isn't quite hooked up right as |
CC. @CarolEidt and @echesakovMSFT Also CC. @davidwrighton to help validate the new |
@@ -14818,6 +14819,8 @@ emitter::insExecutionCharacteristics emitter::getInsExecutionCharacteristics(ins | |||
result.insLatency += PERFSCORE_LATENCY_2C; | |||
break; | |||
|
|||
case INS_bsf: | |||
case INS_bsr: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if we had a way to model Intel vs AMD here.
Instruction | Intel | AMD |
---|---|---|
BSF | 1.00 | 4.00 |
BSR | 1.00 | 5.00 |
POPCNT | 1.00 | 0.50 |
LZCNT | 1.00 | 0.50 |
TZCNT | 1.00 | 0.50 |
- For the non-memory encoding, AMD is actually a bit faster still
- Numbers taken from https://uops.info/table.html
// Arguments: | ||
// node - The hardware intrinsic node | ||
// | ||
void CodeGen::genX86BaseIntrinsic(GenTreeHWIntrinsic* node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed for this PR, but we allowed HW_Category_Scalar
intrinsics to be table driven in importation.
It would be nice to do the same here in codegen...
src/libraries/System.Private.CoreLib/src/System/Numerics/BitOperations.cs
Show resolved
Hide resolved
...ies/System.Private.CoreLib/src/System/Runtime/Intrinsics/X86/X86Base.PlatformNotSupported.cs
Show resolved
Hide resolved
I saw that https://github.com/dotnet/runtime/blob/master/src/coreclr/src/jit/compiler.cpp#L2313-L2320 https://github.com/dotnet/runtime/blob/master/src/coreclr/src/vm/codeman.cpp#L1485-L1487 I couldn't see any cases where X86Base wasn't working, but maybe it needs the same treatment if you're seeing a failure. |
Seeing failures in CI for x86. |
Ah, I found one:
So it looks like |
src/libraries/System.Utf8String.Experimental/src/System/Runtime/Intrinsics/Intrinsics.Shims.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Numerics/BitOperations.cs
Outdated
Show resolved
Hide resolved
We need the instruction sets added here: https://github.com/dotnet/runtime/blob/master/src/coreclr/src/jit/compiler.cpp#L2205, in the same way as we do for ARM here: https://github.com/dotnet/runtime/blob/master/src/coreclr/src/jit/compiler.cpp#L2316 It looks like all the other JIT bits exist (comparing to |
The way I read it, the explicit addition of ArmBase is redundant because it's passed in from the VM side in the same way the higher-level ISAs are. Since I've added X86Base unconditionally here: It's always available unless removed based on the config knobs, just like SSE+ are. Am I missing something there? |
@davidwrighton, did the base checks get inverted? That is should https://github.com/dotnet/runtime/blob/master/src/coreclr/src/jit/compiler.cpp#L2205 be:
You are meant to be able to disable the "baseline" as well, so you can test your software fallback. |
The Vector ISAs don't actually exist, so they're not passed into the JIT. Unless The real ISAs work the opposite way. The VM passes them in if the hardware supports them and the JIT disables them if the config says to.
|
It's probably a good idea to make If I've misunderstood any of that, I guess we should add |
@tannergooding @saucecontrol Vector64/128/256 are set by the JIT itself as Tanner noted. One of the changes I did not realize I made was to make ArmBase set by the VM side. One of the issues is that the VM now attempts to pass a rationalized set of instruction set values to the JIT, and not enabling ArmBase, then requires the VM side to disable all of the other instruction sets. We could tweak that if we needed to but I need to know its necessary before I futz with it. I would avoid marking the X86Base as being implied by support for SSE, as that means if it is disabled, then SSE will be disabled. |
And yes, the current model doesn't make sense where we explicitly set the ArmBase instruction set in both the jit and the VM side of the fence. |
There's presently no way to disable X86Base other than to disable HWIntrinsics entirely. Isn't that the same way ArmBase works? Or would this be a case where disabling HWIntrinsics also disables SIMD?
Would removing it from the JIT side be ok for now, given the larger changes necessary to separate it on the VM side? |
That is the expectation. The helper types and the "base" ISAs don't have their own |
That is exactly what we want. Disabling |
I logged #35305 to track the emitter's handling of the 'w' and 's' encoding bits. |
Rebased on master to bring in #35364, and updated to the new intrinsiclist table layout |
Test failures related to |
ping @davidwrighton - could you have a look at the ISA handling here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. I believe the new instruction set should be enabled by the existing pathways in crossgen2, so no additional crossgen2 change should be needed.
Thanks for the contribution @saucecontrol! |
Testing the assertion @tannergooding made in #953 (comment) that adding x86 intrinsics is as easy as 1,2,3,4,5,6,7 😄
I wanted to get feedback on the introduction of an X86Base ISA to mirror what we have on ARM, and an
X86Base
class in S.R.I to house the methods for new instructions. I recall @jkotas mentioning the possibility of exposing some of the x86 base instructions as internal HWIntrinsics that could be used from higher-level methods, although I can't seem to find the discussion now.I've added HWIntrinsics implementations of
BSF
andBSR
and used them as fallbacks in some of theBitOperations
methods that useLZCNT
andTZCNT
. Since BMI1 requires VEX encoding, it won't be supported on older processors or newer non-VEX processors like the Intel Atom line. The fallbacks will also improve codegen in R2R images, where VEX encoding is disabled.Benchmark results with BMI1 and LZCNT disabled:
And enabled:
Note the nice little improvement to
Log2
by replacing a subtraction with xor to reverse the index returned byLZCNT