-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decompose some bitwise operations in HIR to allow more overall optimizations to kick in #104517
Conversation
…zations to kick in
5c68e29
to
5d85173
Compare
CC. @dotnet/jit-contrib, this is ready for review. It gives some nice improvements on both Arm64 and x64 due to additional optimization opportunities. On xarch in particular it also gives better codegen almost anywhere vpternlog can be used and handles more (but not all) of the possible vpternlog lightups. There are a couple methods where the code size has increased, rather than decreased, due to vpternlog but its a minority overall and typically due to 2 of the inputs being containable. These should be addressable, but I'd rather do that separately since this is still a large net improvement (generally speaking this should entail checking if two operands are containable and opting to not combine them into a vpternlog in that scenario). |
The SPMI replay failure is #104585 |
Rerunning now that the SPMI replay issue should be fixed. This should still be ready for review and notably also fixes some issues that were found by antigen |
// We specially handle this here since we're only producing a | ||
// native intrinsic node in LIR | ||
|
||
std::swap(op1, op2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you remind me - is it fine to do this swap
here in terms of side-effect reordering?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, it's LIR so I guess it is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, in this case its fine specifically because we've validated we're in LIR already.
Otherwise, we should be applying GTF_REVERSE_OPS
or have some comment about expecting the user to have already spilled side effects, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM assuming CI failures are unrelated
Doing this allows more scenarios involving CSE, morph transforms, and other optimizations to kick in. This will also longer term allow
~Compare
to be optimized to the inverse comparison as well.When such optimizations aren't feasible, we still end up getting the good codegen in lowering anyways.