-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add X86Serialize hardware intrinsic. #68677
Conversation
Note regarding the This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change. |
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsThis PR implements #66467 for RyuJIT. Some questions to address:
|
Yes. This tracks the corresponding Effectively If we didn't expose this class and corresponding "ISA", then |
e4fb6f1
to
8475341
Compare
8475341
to
f63abc2
Compare
src/coreclr/tools/Common/JitInterface/ThunkGenerator/InstructionSetDesc.txt
Outdated
Show resolved
Hide resolved
f63abc2
to
b8d2afe
Compare
It looks like the failures are from a permission denied issue. I think that is related to #68715. The superpmi failures are separate, but I believe that is expected as there are no mch files for a new JIT version id/interface? Also, this PR implements the serialize instruction for RyuJIT. Is it possible to implement the intrinsic for mono in a follow up PR? @tannergooding @fanyang-mono |
Feel free to create an issue to track the need of the Mono implementation of this, if you are not planing to implement it with this PR. |
src/coreclr/jit/emitxarch.cpp
Outdated
@@ -16275,6 +16275,13 @@ emitter::insExecutionCharacteristics emitter::getInsExecutionCharacteristics(ins | |||
break; | |||
} | |||
|
|||
case INS_serialize: | |||
{ | |||
result.insLatency = PERFSCORE_LATENCY_140C; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tannergooding do you have a recommendation for a perfscore for serialize
? serialize
will take as long as it takes to flush the instruction pipeline, i.e., its latency/throughput is dependent on the instructions in the pipeline. I can try to get a minimum latency/throughput, or we could try a worse case upper bound?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've based our previous numbers based on the official timing numbers for Skylake
(from Intel® 64 and IA-32 Architectures Software Developer Manuals
iirc).
This is really just meant to be a rough heuristic and so an estimate or average case should be sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are looking to get these numbers internally. For the time being, I have set both to be 1 cycle. Can we update these values once we have the numbers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, provided an issue is logged tracking it being updated.
Could we set it to something slightly more meaningful than 1
in the meantime? I'd expect this to at least be as expensive as a fencing operation (lfence
, mfence
, sfence
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mirrored what was done for mfence
, and set it to 50 cycles (more expensive than mfence
, less expensive than cpuid
which is ~100 cycles per the manual.
This seem ok? I will open up an issue now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems great. Thanks!
b8d2afe
to
fb99562
Compare
Will do. Thanks!. |
9d97c75
to
feffedd
Compare
feffedd
to
4da87cf
Compare
I've created the two issues as discussed and made the changes above. Can I get a full review now (as it was in draft until yesterday)? |
4da87cf
to
c308def
Compare
c308def
to
2a3bfd5
Compare
CC. @dotnet/jit-contrib for second review. |
Going to close and re-open this to restart CI against the latest HEAD. It's been a couple days but it also looks like the issues were unrelated infra issues that had cropped up. |
@BruceForstall or @jakobbotsch, is SuperPMI expected to fail in builds with JIT/EE Version changes? In particular, these are all reporting:
If so, this should be good to merge (just wanting to ensure I'm not missing skipping some subtle other issue here). |
Yes (but probably something we should improve). We should perhaps see if we can skip |
superpmi-replay and superpmi-diffs already are attempting to avoid running on JIT-EE GUID changes, but apparently it's not doing what we hoped?
|
Merging. Will log a tracking issue for the superpmi diffs running even when the JIT/EE version changes |
Thanks for the contribution @anthonycanino! |
Logged #69197 |
Thanks for the reviews @tannergooding, @BruceForstall and @fanyang-mono ! |
This PR implements #66467 for RyuJIT.
Some questions to address: