-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable LZCNT in crossgen #35598
Disable LZCNT in crossgen #35598
Conversation
I couldn't figure out the best area label to add to this PR. Please help me learn by adding exactly one area label. |
This is also relevant to #33737 which explains that |
For crossgen I believe this change is strictly an improvement. However, I see that you did not make the corresponding change in crossgen2. However, this isn't necessarily the wrong choice. Crossgen2 does not use the IsSupported function call mechanism, but instead it performs speculative compilation assuming the processor intrinsic IS available which yields excellent performance for this case on Haswell and newer processors. |
Thanks, I didn't see that issue. The basic problem is that the crossgen1 support for opportunistic ISA use relies on the cost of the I've also been looking at |
I haven't dug into the crossgen2 code, but I read your design document, and this was my understanding. Seems to only be an issue with the way it's handled in crossgen1. |
Agreed. I don't think you should change the crossgen2 behavior. I just wanted to make it clear that I've looked at this, the improvement looks right, but it shouldn't be applied to crossgen2 as it won't really be an improvement there. (I'd have marked this as approved, but I wanted to see a clean set of tests first.) |
Do we have a rough percentage of CPUs that are Haswell or later that would justify whether or not this is actually worthwhile? If you go based off of the March 2020 Steam Hardware Survey: https://store.steampowered.com/hwsurvey/ Then roughly 75% of reported CPUs support AVX2 (and would therefore have BMI1/BMI2/LZCNT, based on what Intel/AMD have shipped): This naturally doesn't take into account developer machines, server hardware, etc. But, if we are saying LZCNT is likely to be supported, then there is an equal number of AVX2/BMI1/BMI2 machines in the same boat and a greater number of AVX machines and we should likely consider what is a "good baseline". |
Well, for crossgen1 I believe the approach that @saucecontrol is taking is strictly better than utilizing lzcnt. For crossgen2 its a more subtle question, and comes down to how much of a penalty will customers suffer from having to run the JIT vs having the slightly better codegen. If we trust the steam survey (which we probably shouldn't, but its somewhat representative of end users running on client hardware who care about performance), we would be trading off slightly better codegen for 75% of the customer base vs 1-2 ms of performance loss on startup for ~25% of the client workload. For the server workload, which is entirely dominated by servers running in the cloud or other high end environments, the numbers are even more supportive of relying on higher specced machines. For instance, while Azure doesn't provide a guarantee for all instance types that they support have Avx2, the support is broadly available in most hardware, as the cloud has grown by orders of magnitude since Haswell was the right chip to buy. In particular, all of our very large first party customers which utilize Azure run exclusively on instance types which support Avx2. |
Since 5.0 will mostly be adopted in 2021 presumably the question is really what the availability will have reached then. |
Well, for crossgen2, its really about 2022 as we don't expect it to get widespread use until .NET 6. I feel that the cost on crossgen2, we've already reached the threshold for being willing to just use the lzcnt instruction, but only just barely. In the next 2 years we should be comfortable using an instruction set that will be 8-9 years old at that point. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the correct change for Crossgen.
I believe what we currently have for Crossgen2 is sufficient due to how it differs
As a followup to #34550 (comment), this is removing LZCNT from the list of x86 ISAs whitelisted for crossgen on CoreLib.
Now that
BitOperations.TrailingZeroCount
has a fallback toBSR
, the cost of queryingLzcnt.IsSupported
before usingLZCNT
is higher than simply using the fallback. This is a situation very specific toLZCNT
because it's most often a one-and-done instruction and the fallback is only very marginally slower.R2RDump diff summary for
System.Private.CoreLib.dll
before and after this change:R2RDump disasm for `System.SpanHelpers.LocateLastFoundByte(UInt64)`
Note:
Lzcnt.LeadingZeroCount
is called only fromBitOperations.LeadingZeroCount
in CoreLib. All changes shown are from the inlined version ofBitOperations.LeadingZeroCount
.cc @tannergooding @davidwrighton