Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor how the embedded rounding is handled so the logic is more reusable #97569

Merged
merged 4 commits into from
Jan 28, 2024

Conversation

tannergooding
Copy link
Member

Previously the logic was done by forcing some bits to be tracked as flags on the node and then observing them later on in codegen. If the rounding mode was a constant, these flags were set in lowering and otherwise repeatedly as part of the switch case handler.

This worked well, however it was taking up precious flag space and wasn't going to be extensible to other types of embedded operations we may also need to support, such as embedded masking. It was also then inconsistent with how embedded broadcast was handled.

This updates the logic to instead pass along insOpts through the necessary codepaths which allows codegen to handle everything itself instead. It does this by doing an up front check for if the node is using embedded rounding and if so extracts the mode and merges the info with the tracked insOpts.

The new pattern will allow a similar mechanism to be done for embedded masking where codegen can check for BlendVariableMask and if it is present, it can extract the contained operation (such as Add) and call genHWIntrinsic with the updated insOpts, thus allowing us to track the EVEX.aaa bits without needing to handle much larger nodes, without needing to specialize the handling in all the various code paths, etc.

@ghost ghost assigned tannergooding Jan 26, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 26, 2024
@ghost
Copy link

ghost commented Jan 26, 2024

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Previously the logic was done by forcing some bits to be tracked as flags on the node and then observing them later on in codegen. If the rounding mode was a constant, these flags were set in lowering and otherwise repeatedly as part of the switch case handler.

This worked well, however it was taking up precious flag space and wasn't going to be extensible to other types of embedded operations we may also need to support, such as embedded masking. It was also then inconsistent with how embedded broadcast was handled.

This updates the logic to instead pass along insOpts through the necessary codepaths which allows codegen to handle everything itself instead. It does this by doing an up front check for if the node is using embedded rounding and if so extracts the mode and merges the info with the tracked insOpts.

The new pattern will allow a similar mechanism to be done for embedded masking where codegen can check for BlendVariableMask and if it is present, it can extract the contained operation (such as Add) and call genHWIntrinsic with the updated insOpts, thus allowing us to track the EVEX.aaa bits without needing to handle much larger nodes, without needing to specialize the handling in all the various code paths, etc.

Author: tannergooding
Assignees: tannergooding
Labels:

area-CodeGen-coreclr

Milestone: -

@ryujit-bot
Copy link

Diff results for #97569

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

MinOpts (-0.01% to +0.00%)
Collection PDIFF
libraries.pmi.linux.arm64.checked.mch -0.01%

Details here


@ryujit-bot
Copy link

Diff results for #97569

Throughput diffs

Throughput diffs for windows/arm64 ran on windows/x64

MinOpts (0.00% to +0.01%)
Collection PDIFF
libraries.pmi.windows.arm64.checked.mch +0.01%

Details here


@tannergooding
Copy link
Member Author

CC. @dotnet/jit-contrib for review. Simple cleanup that will significantly simplify adding the embedded masking support and will make that general support easier to add to Arm64 for SVE, since Arm64 already uses a similar pattern for passing around insOpts itself.

Copy link
Contributor

@TIHan TIHan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tannergooding
Copy link
Member Author

JitStress failure is known, related to the new small type handling for interlocked APIs

@tannergooding tannergooding merged commit 7a60900 into dotnet:main Jan 28, 2024
136 of 139 checks passed
@tannergooding tannergooding deleted the hwintrin-gen branch January 28, 2024 02:57
Ruihan-Yin added a commit to Ruihan-Yin/runtime that referenced this pull request Feb 13, 2024
BruceForstall pushed a commit that referenced this pull request Feb 15, 2024
* Expose embedded rounding related scalar intrinsic APIs

* Expose embedded rounding related arithmatic intrinsic APIs

* Ensure the new APIs are properly lowered

* Bug fixes

* Expose embedded rounding casting APIs

* Expose arithmetic embedded rounding unit tests

* Add a test template for embedded rounding APIs, this will be enough to cover all the binary APIs including vector and scalar operations.

* Add template for unary ops

* Expose all  the embedded rounding unit tests generated by the templates

* Expose embedded rounding casting APIs unit tests

* Expose handwritten unit tests for embedded rounding APIs with special input arg lists.

* Bug fixes:
1. ConvertToVector256Int32/UInt32 use special code gen path, adding a fallback path when embedded rounding is activated and the control byte is not constant.

* Bug fix:
Fix wrong data type in the API definition.

* formatting

* Update API documents for embedded rounding APIs.

* resolve conflicts with #97569

* formatting

* bug fix and remove un-needed SAE related intrinsics

* resolve comments:
1. update the arg lists for genHWIntrinsic_R_RM

* resolve comments:
Add jumptable fallback to non-table driven embedded rounding intrinsics.

* resolve comments:
1. remove some redundent checks on embedded rounding intrinsics

* Bug fix:
pass the correct operand GenTree node, when emitting the fallback for embedded rounding intrinsics.

* formatting

* revert an unexpected change.

* 1.Resolve comments:
2. Added FMA intrinsics with embedded rounding and unit tests.

* Expose the rest of embedded rounding APIs

* formatting

* Ensure the control byte local is assigned to the correct register.
@github-actions github-actions bot locked and limited conversation to collaborators Feb 27, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants