-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable AVX512 Additional 16 SIMD Registers, with accessors instead of macros #81818
Enable AVX512 Additional 16 SIMD Registers, with accessors instead of macros #81818
Conversation
This allows for more registers to be encoded in the register allocator.
Commit includes refactoring code to use `const instrDesc *` instead of `instruction` so information about when EVEX is needed (due to high SIMD registers) is available to the emitter.
Commit constrains certain hw intrinsics and gentree nodes to use lower SIMD registers even if upper SIMD registers are available due to limitations of EVEX encoding for certain instructions. For example, SSE `Reciprocal` lowers to `rcpps` which does not have an EVEX encoding form, hence, we cannot allow that hw intrincis node to use a high SIMD register. These intrinsics are marked with `HW_Flag_NoEvexSemantics`. Other such intructions related to masking (typically marked with `HW_Flag_ReturnsPerElementMask`) also have similar issues (though they can be replaced with the EVEX k registers and associated masking when implemented). In addition, the callee/calleer save registers have also been adjusted to properly handle the presence and absence of AVX512 upper simd registers at runtime.
Co-authored-by: Bruce Forstall <[email protected]>
Co-authored-by: Bruce Forstall <[email protected]>
Co-authored-by: Bruce Forstall <[email protected]>
Co-authored-by: Bruce Forstall <[email protected]>
Co-authored-by: Bruce Forstall <[email protected]>
Co-authored-by: Bruce Forstall <[email protected]>
Co-authored-by: Bruce Forstall <[email protected]>
Co-authored-by: Bruce Forstall <[email protected]>
Co-authored-by: Bruce Forstall <[email protected]>
…runtime into avx512-upper-regs
This reverts commit 91cf3db.
Convert from macros to accessor functions for RBM_ALLFLOAT, RBM_FLT_CALLEE_TRASH, CNT_CALLEE_TRASH_FLOAT. Convert LSRA use of ACTUAL_REG_COUNT to AVAILABLE_REG_COUNT, and create an accessor for that value for AMD64 as well.
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak Issue DetailsThis is #79544 with one additional change.
|
#if defined(TARGET_AMD64) | ||
regMaskTP get_RBM_ALLFLOAT() const | ||
{ | ||
return compiler->rbmAllFloat; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not compiler->get_RBM_ALLFLOAT()
and likewise for other methods in individual classes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either way is fine; they're equivalent. Your suggestion is slightly better encapsulation. However, I'd like to do an experiment to see how much TP could be gained by making a local copy of these fields -- probably a minimal effect.
Approach seems reasonable and hoping it won't affect the TP. |
TP looks equivalent to the baseline: ASM diffs generated on Windows x64linux arm64Diffs are based on 1,462,192 contexts (402,179 MinOpts, 1,060,013 FullOpts). MISSED contexts: 717 (0.05%) No diffs found. DetailsContext information
osx arm64Diffs are based on 1,361,376 contexts (398,828 MinOpts, 962,548 FullOpts). MISSED contexts: 626 (0.05%) No diffs found. DetailsContext information
linux x64Diffs are based on 1,356,324 contexts (348,118 MinOpts, 1,008,206 FullOpts). MISSED contexts: 763 (0.06%) No diffs found. DetailsContext information
windows arm64Diffs are based on 1,468,342 contexts (400,661 MinOpts, 1,067,681 FullOpts). MISSED contexts: 736 (0.05%) No diffs found. DetailsContext information
windows x64Diffs are based on 1,564,142 contexts (446,611 MinOpts, 1,117,531 FullOpts). MISSED contexts: 959 (0.06%) No diffs found. DetailsContext information
#Throughput impact on Windows x64 The following shows the impact on throughput in terms of number of instructions executed inside the JIT. Negative percentages/lower numbers are better. linux arm64No significant throughput differences found DetailsAll contexts:
MinOpts contexts:
FullOpts contexts:
osx arm64No significant throughput differences found DetailsAll contexts:
MinOpts contexts:
FullOpts contexts:
linux x64Overall (+0.29% to +0.47%)
MinOpts (+0.31% to +0.71%)
FullOpts (+0.27% to +0.47%)
DetailsAll contexts:
MinOpts contexts:
FullOpts contexts:
windows arm64No significant throughput differences found DetailsAll contexts:
MinOpts contexts:
FullOpts contexts:
windows x64Overall (+0.26% to +0.43%)
MinOpts (+0.27% to +0.65%)
FullOpts (+0.23% to +0.43%)
DetailsAll contexts:
MinOpts contexts:
FullOpts contexts:
|
All failures look known or infra |
This change was merged into #79544 |
This is #79544 with one additional change.