Refactor how the embedded rounding is handled so the logic is more reusable #97569

tannergooding · 2024-01-26T18:34:42Z

Previously the logic was done by forcing some bits to be tracked as flags on the node and then observing them later on in codegen. If the rounding mode was a constant, these flags were set in lowering and otherwise repeatedly as part of the switch case handler.

This worked well, however it was taking up precious flag space and wasn't going to be extensible to other types of embedded operations we may also need to support, such as embedded masking. It was also then inconsistent with how embedded broadcast was handled.

This updates the logic to instead pass along insOpts through the necessary codepaths which allows codegen to handle everything itself instead. It does this by doing an up front check for if the node is using embedded rounding and if so extracts the mode and merges the info with the tracked insOpts.

The new pattern will allow a similar mechanism to be done for embedded masking where codegen can check for BlendVariableMask and if it is present, it can extract the contained operation (such as Add) and call genHWIntrinsic with the updated insOpts, thus allowing us to track the EVEX.aaa bits without needing to handle much larger nodes, without needing to specialize the handling in all the various code paths, etc.

…usable

ghost · 2024-01-26T18:34:54Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Previously the logic was done by forcing some bits to be tracked as flags on the node and then observing them later on in codegen. If the rounding mode was a constant, these flags were set in lowering and otherwise repeatedly as part of the switch case handler.

This worked well, however it was taking up precious flag space and wasn't going to be extensible to other types of embedded operations we may also need to support, such as embedded masking. It was also then inconsistent with how embedded broadcast was handled.

This updates the logic to instead pass along insOpts through the necessary codepaths which allows codegen to handle everything itself instead. It does this by doing an up front check for if the node is using embedded rounding and if so extracts the mode and merges the info with the tracked insOpts.

The new pattern will allow a similar mechanism to be done for embedded masking where codegen can check for BlendVariableMask and if it is present, it can extract the contained operation (such as Add) and call genHWIntrinsic with the updated insOpts, thus allowing us to track the EVEX.aaa bits without needing to handle much larger nodes, without needing to specialize the handling in all the various code paths, etc.

Author:	tannergooding
Assignees:	tannergooding
Labels:	`area-CodeGen-coreclr`
Milestone:	-

ryujit-bot · 2024-01-26T22:34:03Z

Diff results for #97569

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

MinOpts (-0.01% to +0.00%)

Collection	PDIFF
libraries.pmi.linux.arm64.checked.mch	-0.01%

Details here

ryujit-bot · 2024-01-27T00:34:15Z

Diff results for #97569

Throughput diffs

Throughput diffs for windows/arm64 ran on windows/x64

MinOpts (0.00% to +0.01%)

Collection	PDIFF
libraries.pmi.windows.arm64.checked.mch	+0.01%

Details here

…gister

tannergooding · 2024-01-27T16:38:00Z

CC. @dotnet/jit-contrib for review. Simple cleanup that will significantly simplify adding the embedded masking support and will make that general support easier to add to Arm64 for SVE, since Arm64 already uses a similar pattern for passing around insOpts itself.

TIHan

LGTM

tannergooding · 2024-01-28T02:57:30Z

JitStress failure is known, related to the new small type handling for interlocked APIs

* Expose embedded rounding related scalar intrinsic APIs * Expose embedded rounding related arithmatic intrinsic APIs * Ensure the new APIs are properly lowered * Bug fixes * Expose embedded rounding casting APIs * Expose arithmetic embedded rounding unit tests * Add a test template for embedded rounding APIs, this will be enough to cover all the binary APIs including vector and scalar operations. * Add template for unary ops * Expose all the embedded rounding unit tests generated by the templates * Expose embedded rounding casting APIs unit tests * Expose handwritten unit tests for embedded rounding APIs with special input arg lists. * Bug fixes: 1. ConvertToVector256Int32/UInt32 use special code gen path, adding a fallback path when embedded rounding is activated and the control byte is not constant. * Bug fix: Fix wrong data type in the API definition. * formatting * Update API documents for embedded rounding APIs. * resolve conflicts with #97569 * formatting * bug fix and remove un-needed SAE related intrinsics * resolve comments: 1. update the arg lists for genHWIntrinsic_R_RM * resolve comments: Add jumptable fallback to non-table driven embedded rounding intrinsics. * resolve comments: 1. remove some redundent checks on embedded rounding intrinsics * Bug fix: pass the correct operand GenTree node, when emitting the fallback for embedded rounding intrinsics. * formatting * revert an unexpected change. * 1.Resolve comments: 2. Added FMA intrinsics with embedded rounding and unit tests. * Expose the rest of embedded rounding APIs * formatting * Ensure the control byte local is assigned to the correct register.

Refactor how the embedded rounding is handled so the logic is more re…

08985f6

…usable

ghost assigned tannergooding Jan 26, 2024

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 26, 2024

tannergooding added 2 commits January 26, 2024 10:46

Remove some unnecessary changes

7a3ae25

Apply formatting patch

cffe6ba

build-analysis bot mentioned this pull request Jan 26, 2024

"We stopped hearing from agent Azure Pipelines 32. Verify the agent machine is running and has a healthy network connection" dotnet/dnceng#1886

Open

3 tasks

tannergooding force-pushed the hwintrin-gen branch from 47171ba to fa808b7 Compare January 27, 2024 00:35

build-analysis bot mentioned this pull request Jan 27, 2024

Tests crashing in CI with no dump: exit code 137 means SIGKILL Killed #97049

Closed

Ensure we always consume the rounding mode operand and produce the re…

b19a4c5

…gister

tannergooding force-pushed the hwintrin-gen branch from fa808b7 to b19a4c5 Compare January 27, 2024 13:47

TIHan approved these changes Jan 28, 2024

View reviewed changes

tannergooding merged commit 7a60900 into dotnet:main Jan 28, 2024
136 of 139 checks passed

tannergooding deleted the hwintrin-gen branch January 28, 2024 02:57

Ruihan-Yin mentioned this pull request Feb 1, 2024

Expose AVX512F embedded rounding intrinsics. #97415

Merged

Ruihan-Yin added a commit to Ruihan-Yin/runtime that referenced this pull request Feb 13, 2024

resolve conflicts with dotnet#97569

c8f279b

github-actions bot locked and limited conversation to collaborators Feb 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor how the embedded rounding is handled so the logic is more reusable #97569

Refactor how the embedded rounding is handled so the logic is more reusable #97569

tannergooding commented Jan 26, 2024

ghost commented Jan 26, 2024

ryujit-bot commented Jan 26, 2024

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

ryujit-bot commented Jan 27, 2024

Throughput diffs

Throughput diffs for windows/arm64 ran on windows/x64

tannergooding commented Jan 27, 2024

TIHan left a comment

tannergooding commented Jan 28, 2024

Refactor how the embedded rounding is handled so the logic is more reusable #97569

Refactor how the embedded rounding is handled so the logic is more reusable #97569

Conversation

tannergooding commented Jan 26, 2024

ghost commented Jan 26, 2024

ryujit-bot commented Jan 26, 2024

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

ryujit-bot commented Jan 27, 2024

Throughput diffs

Throughput diffs for windows/arm64 ran on windows/x64

tannergooding commented Jan 27, 2024

TIHan left a comment

Choose a reason for hiding this comment

tannergooding commented Jan 28, 2024