-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Conversation
src/jit/morph.cpp
Outdated
{ | ||
// Math.Pow(x, 2) -> x*x | ||
newNode = gtNewOperNode(GT_MUL, powerCon->TypeGet(), arg0, | ||
gtNewLclvNode(arg0->gtLclVar.gtLclNum, arg0->TypeGet())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that you are assuming that arg0
is GT_LCL_VAR
. Why would that always be true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mikedn So it can be GT_LCL_VAR, GT_CALL, GT_FIELD (+ static field access with initialization). I've temporarily limited it to LCL_VAR, I guess for other cases I need to introduce a tmp var?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it can be GT_LCL_VAR, GT_CALL, GT_FIELD
I don't know why it would be limited to those opers, it can probably be pretty much anything, like a + b * (float)c
.
Anyway, yes, if it's not GT_LCL_VAR
(or GT_LCL_FLD
perhaps) you need to introduce a temporary variable because the original expression now has multiple uses in x * x
. Introducing a temp is a bit unfortunate because it breaks trees and that may cause other problems. In the past I toyed with the idea of adding a SQR
oper to avoid multiple uses though I'm not sure if it's something worth doing.
(if it doesn't already) could it produce three operands for AVX / AVX512F like gcc? PowN1: vmovsd xmm1, qword ptr [reloc @RWD00]
- vdivsd xmm1, xmm0
- vmovaps xmm0, xmm1
+ vdivss xmm0, xmm1, xmm0 |
@am11 I guess it's a task for the low level optimizations/register allocator |
There is at least one bug here (and I think we have a tracking issue somewhere)... The disassembly output is definitely wrong. If we are actually outputting the vex-encoding as indicated ( I believe we are just hitting the former (we are actually using the VEX encoding but not printing all three registers), as (iirc) the emitter has logic to always adjust things appropriately. However, this may mean there is another issue where we are going through a path in codegen that isn't VEX aware and therefore the register allocation done might not be ideal. @CarolEidt might have more context here. |
I actually have an old branch that generates the 3 operand VEX form for floating point ops. The disassembler does display the 3 operand form when |
I logged a bug here: https://github.com/dotnet/coreclr/issues/26569, to track |
I think it should be added not only for |
with C's |
@dotnet/jit-contrib |
Thank you for your contribution. As announced in dotnet/coreclr#27549 this repository will be moving to dotnet/runtime on November 13. If you would like to continue working on this PR after this date, the easiest way to move the change to dotnet/runtime is:
|
} | ||
else if (arg0->OperIs(GT_IND) && arg0->AsIndir()->Addr()->gtGetOp1()->OperIs(GT_LCL_VAR)) | ||
{ | ||
// Math.Pow(x, 2) -> x*x where x is a field |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way the code is written the only field that it will recognize is the first field of a struct (the field at offset 0). I doubt that was the intention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mikedn heh, definitely not the intention, I wish I actually could try the "introduce a tmp local" scenario instead.
If that is an issue, then the places to track it down and fix it are in I'm wondering if it might be better to transform this in the importer, before the call has actually be created. At that time:
Other thoughts @dotnet/jit-contrib ? |
I had done some more investigation here and basically, outside of HWIntrinsics, most other code paths (including The emitter ends up fixing these later by checking I think, ideally, we would switch all the floating-point code over to be VEX aware (and therefore mark instructions as non-RMW when VEX is available). We would then assert in the emitter that we get dst, op1, and op2 separately when VEX is supported. Edit: However, for HWIntrinsics, we always go through VEX aware code-paths and even VEX aware emitter calls (i.e. |
Thank you for your contribution. As announced in #27549 the dotnet/runtime repository will be used going forward for changes to this code base. Closing this PR as no more changes will be accepted into master for this repository. If you’d like to continue working on this change please move it to dotnet/runtime. |
Fixes https://github.com/dotnet/coreclr/issues/26434
Converts:
Math.Pow(x, 2)
tox*x
Math.Pow(x, 1)
tox
Math.Pow(x, -1)
to1/x
(same for MathF and floats)
Can be added:
Math.Pow(c1, c2)
toc3
(call PAL_pow?)Math.Pow(1, x)
to1
Math.Pow(2, x)
toexp2(x)
Math.Pow(x, 0)
to1
Math.Pow(x, -0)
to1
Math.Pow(x, 0.5)
tosqrt(x)
Math.Pow(x, -2)
to1/(x*x)
(probably is not safe)Test
Before (tier1):
After (tier1):
Will run the jit-diff tools but I suspect it won't find anything in the BCL (UPD there are actually two cases). But I did see such usages in different applications. E.g.
Xenko
(a game engine): https://github.com/xenko3d/xenko/search?q=Math.Pow&unscoped_q=Math.PowAlso, once some sort of ffast-math mode is implemented - we can unroll other constants (gcc unrolls up to 100, clang - 32)