-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable EVEX feature: embedded broadcast for Vector128/256/512.Add() in limited cases #84821
Merged
Merged
Changes from all commits
Commits
Show all changes
44 commits
Select commit
Hold shift + click to select a range
6b2df17
Enable EVEX feature: embedded broadcast
Ruihan-Yin a8c7d82
remove some irrelevent change from previous main.
Ruihan-Yin 73fb02f
Enable containment at Broadcast intrinsic
Ruihan-Yin 52cd44d
Convert the check logics on broadcast into a flag
Ruihan-Yin 10d75c0
bug fixes:
Ruihan-Yin cdb6144
apply format patch.
Ruihan-Yin 55a6bb7
Add "insOpts" data structure to xarch:
Ruihan-Yin 5569217
Add "OperIsBroadcastScalar" check:
Ruihan-Yin 4f92123
rebase the branch and resolve conflicts
Ruihan-Yin 328549f
changes based on the reivews:
Ruihan-Yin f86a993
apply format patch
Ruihan-Yin d486ed4
bug fixes
Ruihan-Yin 2c60838
bug fixes
Ruihan-Yin 172861e
aaply format patch
Ruihan-Yin 02c61c7
Enable embedded broadcast for Vector128<float>.Add
Ruihan-Yin 2a6f8a7
Enable embedded broadcast for Vector512<float>.Add
Ruihan-Yin b036bcd
make double as embedded broadcast supported
Ruihan-Yin 4358ee0
Add EB support to AVX_BroadcastScalarToVector*
Ruihan-Yin 3a9093a
apply format patch
Ruihan-Yin d018d99
Enable embedded broadcast for double const vector
Ruihan-Yin 7557db7
Enable embedded broadcast for integer Add.
Ruihan-Yin 867eaf0
Changes based on the review:
Ruihan-Yin 3f4d95b
removed the gentree flag: GTF_VECCON_FROMSCALAR
Ruihan-Yin 32fd87a
Bug fixes on embedded broadcast with AVX_Broadcast
Ruihan-Yin 4f97298
enable embedded broadcast in R_R_A path
Ruihan-Yin a5c4414
apply format patch
Ruihan-Yin 12363a9
bug fixes:
Ruihan-Yin b561885
Changes based on reviews:
Ruihan-Yin 90e27c4
unfold VecCon node when lowering if this node is
Ruihan-Yin 9bfa325
apply format patch
Ruihan-Yin 8072d29
bug fixes:
Ruihan-Yin 7db1c5e
resolve the mishandling for the previous conflict.
Ruihan-Yin c916008
move the unfolding logic to ContainChecks
Ruihan-Yin 4ee1f97
Code changes based on the review
Ruihan-Yin 97cb23a
apply format patch
Ruihan-Yin 37b57be
support embedded broadcast for GT_IND
Ruihan-Yin 14a370a
bug fixes:
Ruihan-Yin 64fec11
apply format patch
Ruihan-Yin 45b7807
Introduce MakeHWIntrinsicSrcContained():
Ruihan-Yin cb8feb4
Code changes based on reviews:
Ruihan-Yin 3fe0a2f
Code changes based on review
Ruihan-Yin 6fb6e48
apply format patch
Ruihan-Yin 3b3d0d1
Code changes based on review:
Ruihan-Yin 36af7b7
Code changes based on review:
Ruihan-Yin File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reserves a bit for
EVEX.b
for all xarch instructions, even though very few will actually need it. Is there some other way to represent this data? E.g., newinstrDesc
types for those that need it, or maybe just for all EVEX encoded instructions, with extra fields for EVEX needs? For broadcast, could we create newinsFormat
values to use for memory reads that use broadcast (instead of using the actual memory type)?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some notes, we're going to need potentially up to 7 new bits: #84821 (comment)
A few of this bits are used to represent +1 register for the kmask scenario.
vfixupimm
andvpternlog
are2 registers
+1 register or addressing mode
+1 constant
(this is everything we support today). We then also have +1 mask register
+ a bit forEVEX.b
+ a bit forEVEX.Z
+ a bit forEVEX.L'L
(all of which need to be carried until thecode
is constructed).The
broadcast
case is already going to require someaddressing mode
. The sameEVEX.b
bit may also be used to flag therounding control
orSAE control
, however, which doesn't require addressing. TheEVEX.Z
bit is only used with theopmask
register but theopmask
can be used independently ofaddressing
.It would be nice to squeeze these bits into the spare padding we have available, if possible, but its not "required". Doing so may help throughput due to having less
instrDesc*
kinds to handle.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should think more about this. Every bit that is put here takes away from "small" constants. We should also consider that code requiring EVEX information is likely to be extremely rare compared to non-EVEX instructions. For embedded broadcast, which requires an addressing mode, maybe
emitAddrMode iiaAddrMode
can carry the data, for example.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I didn't realize these spare bits were used to hold the small constants 👍
For clarification, what is currently defined as a "small" constant? On the x86 side we have:
imm8
- most commonly used, used extensively by hwintrinsics and general instructions alikeimm16
- extremely rare as emitting 16-bit operations is itself very rare and typically more expensive than a 32-bit operationimm32
- second most common and used by calls, jumps, etcimm64
- 64-bit only and only for 1 instructionMy naive guess would then be that we want to reserve
8-bits
for "small constants" and never shrink past that. 16/32-bit constants would then be considered non-small and part of the regularinstrDesc
. 64-bit constants would be "unique" since it impacts 1 instruction and is rare.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, a "small constant" in the emitter is "whatever space is left over in the first 64 bits of the instrDesc". Basically, a "small" instrDesc is 64-bits.
_idSmallCns
takes up whatever space is left over after the instruction opcode, 2 registers, GC type, etc. It's 7-12 bits, currently, depending on architecture. If the value doesn't fit in that, we have to allocate aninstrDescCns
or some otherinstrDesc
subclass to hold it. Adding more bits in the "small instrDesc" section pushes some constants to require a biggerinstrDesc
format.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take responsibility for splitting this into its own
instrDesc
as part of adding the support for embedded rounding control and masking support.