-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft for Min/Max intrinsics xarch (#65625) #65700
Conversation
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsI made the changes using #65584 as a referrence. It works with HelloWorld'sh code, but I'm stuck with building it via build.cmd. It doesn't want to compile and I can't figure out why.
|
maxss and vmaxss are different instructions basically (VEX encoding), only vmaxss can be used with R_R_R, but it's not available without AVX |
src/coreclr/jit/codegenxarch.cpp
Outdated
|
||
case NI_System_Math_Max: | ||
genConsumeOperands(treeNode->AsOp()); | ||
GetEmitter()->emitIns_R_R_R(INS_maxss, emitActualTypeSize(treeNode), treeNode->GetRegNum(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You want to use emitIns_SIMD_R_R_R
which will take care of VEX
vs non-VEX
differences
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd actually also expect we want to handle things like containment/etc in which case lowering to a HWIntrinsic might be simpler overall
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree, manual containment management is quite verbose
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tannergooding Thank you for the advice. I'm sorry , but could you tell me more about the containments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally each node is processed by the JIT and will individually get register assignments and code generated for it. Containment is the term used to mean that a node should not be handled that way and instead is handled directly by its parent node.
So by default, if you have something like GT_HWINTRINSIC NI_System_Math_Max
and it has a child node like GT_IND float
you would end up generating
movss reg1, [mem]
maxss reg2, reg1
However, if you mark the GT_IND
as "contained", then you can handle these together and emit just maxss reg1, [mem]
.
There is a lot of existing logic for supporting containment (https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/lowerxarch.cpp#L6030) and so it would likely be better to lower or import NI_System_Math_Max
to be the relevant HWIntrinsic
node(s) instead so that much of this handling implicitly lights up, rather than having to duplicate it just for the NI_System_Math_Max
node.
If you were to do it the other way, by just handling NI_System_Math_Max
, then it would likely be done around here: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/lowerxarch.cpp#L5371-L5387
I assume failures are related e.g.:
|
@EgorBo Since I'm learning to work with the project, I got a question. Let's say you have a piece of code failing (like the assertion in the log). How do you run clr against this piece of code to debug what is wrong? I believe there must be a better way than just "copy pasting" the code into a simple "Hello world" project to emulate the behaviour. Thank you in advance. |
@tannergooding Thank you for the explanation. Let's say I want to implement importing. I made an investigation and the HWIntrinsics are used if either a type belongs to System.Numerical.Vector or System.Numerics. Is it ok to implement the import somewhere here runtime/src/coreclr/jit/importer.cpp Lines 5158 to 5165 in 43ab6b8
in case FEATURE_HW_INTRINSICS directive is turned on? The second questions is, I can't figure out which ISA to use for converting NI_System_Math_Max to a relevant HW node. Could you give me a hand please? |
This pull request has been automatically marked |
Sorry, I had missed this. You can generally see the two approaches if you look at
runtime/src/coreclr/jit/importer.cpp Lines 4390 to 4459 in b5381bf
FusedMultiplyAdd is a slightly more complex sequence of nodes.
Alternatively, many of the other runtime/src/coreclr/jit/importer.cpp Lines 4461 to 4498 in b5381bf
runtime/src/coreclr/jit/lowerxarch.cpp Lines 5323 to 5346 in b5381bf
runtime/src/coreclr/jit/codegenxarch.cpp Lines 7421 to 7457 in b5381bf
|
1 similar comment
Sorry, I had missed this. You can generally see the two approaches if you look at
runtime/src/coreclr/jit/importer.cpp Lines 4390 to 4459 in b5381bf
FusedMultiplyAdd is a slightly more complex sequence of nodes.
Alternatively, many of the other runtime/src/coreclr/jit/importer.cpp Lines 4461 to 4498 in b5381bf
runtime/src/coreclr/jit/lowerxarch.cpp Lines 5323 to 5346 in b5381bf
runtime/src/coreclr/jit/codegenxarch.cpp Lines 7421 to 7457 in b5381bf
|
This pull request has been automatically marked |
This pull request will now be closed since it had been marked |
@EgorBo @tannergooding I'm sorry, I had been busy recently and the PR was closed. Can I open it again? |
Yep, feel free to reopen it whenever you have time again |
I made the changes using #65584 as a referrence. It works with HelloWorld'sh code, but I'm stuck with building it via build.cmd. It doesn't want to compile and I can't figure out why.
I face the errors:
@EgorBo Could you please help me to figure out what I implemented wrong?