-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RyuJIT: x*2 -> x+x; x*1 -> x (fp) #33024
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love this optimization!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Why wouldn't we do this earlier, say in morph? |
@AndyAyersMS But if you think it's better to have it in morph - no problem, should be even easier than with LIR. |
Moved to A small regression: https://www.diffchecker.com/TMKg73nv (CSE related?) cc @sandreenko |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
A small regression: https://www.diffchecker.com/TMKg73nv (CSE related?)
Total bytes of code 591 vs 592
I will check it tomorrow and merge the PR if it is not something unexpected.
{ | ||
if ((oper == GT_MUL) && !op1->IsCnsFltOrDbl() && op2->IsCnsFltOrDbl()) | ||
{ | ||
if (op2->AsDblCon()->gtDconVal == 2.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In dotnet/coreclr#24584 we introduced an optimization which replaces x / pow2
with x * (1 / pow2)
.
In the case pow2
is 0.5
, this would transform x / 0.5
into x * 2.0
and so the question is: Will this still be correctly optimized in this scenario? That is, will it become x + x
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tannergooding heh, yeah, I checked that 🙂
x / 0.5
is optimized to vaddss
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a general plan, doc, or outline of how to order these optimizations and ensure they stay correctly ordered moving forward?
Ideally, we would be able to apply these transformations without needing to consider that x / pow2
must come before x * 2.0
because reason
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no idea, we got lucky these two were executed in a correct order
I guess when you optimize something, e.g. change operator of a node (e.g. uX GT_GT 0
to X GT_NE 0
) and feel this particular node can be re-optimized under a different case you should either:
goto case GT_NE_OP
(ugly)- call
fgMorphSmpOp
recursively (needs some recursion depth limit just in case)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AndyAyersMS, Do you have any thoughts/comments here?
As we introduce more optimizations, this seems like something that will be easy to miss and it would be good to have a consistent process for how handling it is normally recommended...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, a good example for this problem is this PR: https://github.com/dotnet/coreclr/pull/25744/files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's in morph is pretty messy. We don't have a good way of determining if there are dependencies between optimizations, or if one optimization may enable another. There are long-term ambitions of refactoring all this but it is not something we'll get to anytime soon.
Best I can suggest for now is to make sure you run diffs widely, eg over the entire set of Pri1 tests, and look carefully at the diffs, or (if you are internal) run your change over some of our SPMI collections.
We want to avoid repeated subtree traversals so we don't waste time, get stuck in some oscillating mode or blow the stack; so re-invoking morph recursively is only really viable if the node has been simplified in some way. If you are really sure optimizing A enables optimization B, then factor B out and invoke it as a subroutine as part finishing up A.
Sorry for keeping you waiting, I should have reviewed it early. However, looks like this optimization doesn't work good with minopts (or when
a bad case with
so I think it should be protected by For some reason, |
You can pass |
That is probably broken currently, I am trying to debug that. |
I got diffs for With the new transformation tree before:
tree after:
@EgorBo please add |
Failure is unrelated |
Thanks @EgorBo. |
Just a small peephole optimization for floats/doubles:
x * 2
->x + x
, I decided to implement it in lowering just like in LLVM (implemented in DAG*). In generalmulss\sd
andaddss\sd
are the same in terms of latency\throughput on most CPUs butadd
doesn't need to load2.0
constant.UPD also,
x * 1
->x
Current codegen
New codegen
Jit-diff (-f -pmi)
Benchmark:
Core i7 8700K (Coffee Lake), Windows 10 x64.