Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable AVX512 Additional 16 SIMD Registers, with accessors instead of macros #81818

Conversation

BruceForstall
Copy link
Member

This is #79544 with one additional change.

anthonycanino and others added 30 commits January 17, 2023 10:16
This allows for more registers to be encoded in the register allocator.
Commit includes refactoring code to use `const instrDesc *` instead of `instruction`
so information about when EVEX is needed (due to high SIMD registers) is
available to the emitter.
Commit constrains certain hw intrinsics and gentree nodes to use
lower SIMD registers even if upper SIMD registers are available due
to limitations of EVEX encoding for certain instructions.

For example, SSE `Reciprocal` lowers to `rcpps` which does not have an
EVEX encoding form, hence, we cannot allow that hw intrincis node to use
a high SIMD register.

These intrinsics are marked with `HW_Flag_NoEvexSemantics`. Other such
intructions related to masking (typically marked with
`HW_Flag_ReturnsPerElementMask`) also have similar issues (though they
can be replaced with the EVEX k registers and associated masking when
implemented).

In addition, the callee/calleer save registers have also been adjusted
to properly handle the presence and absence of AVX512 upper simd
registers at runtime.
Co-authored-by: Bruce Forstall <[email protected]>
Co-authored-by: Bruce Forstall <[email protected]>
anthonycanino and others added 4 commits February 3, 2023 10:43
This reverts commit 91cf3db.
Convert from macros to accessor functions for
RBM_ALLFLOAT, RBM_FLT_CALLEE_TRASH, CNT_CALLEE_TRASH_FLOAT.
Convert LSRA use of ACTUAL_REG_COUNT to AVAILABLE_REG_COUNT,
and create an accessor for that value for AMD64 as well.
@ghost ghost assigned BruceForstall Feb 8, 2023
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 8, 2023
@ghost
Copy link

ghost commented Feb 8, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

This is #79544 with one additional change.

Author: BruceForstall
Assignees: BruceForstall
Labels:

area-CodeGen-coreclr

Milestone: -

#if defined(TARGET_AMD64)
regMaskTP get_RBM_ALLFLOAT() const
{
return compiler->rbmAllFloat;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not compiler->get_RBM_ALLFLOAT() and likewise for other methods in individual classes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way is fine; they're equivalent. Your suggestion is slightly better encapsulation. However, I'd like to do an experiment to see how much TP could be gained by making a local copy of these fields -- probably a minimal effect.

@kunalspathak
Copy link
Member

Approach seems reasonable and hoping it won't affect the TP.

@BruceForstall
Copy link
Member Author

TP looks equivalent to the baseline:

ASM diffs generated on Windows x64

linux arm64

Diffs are based on 1,462,192 contexts (402,179 MinOpts, 1,060,013 FullOpts).

MISSED contexts: 717 (0.05%)

No diffs found.

Details

Context information

Collection Diffed contexts MinOpts FullOpts Missed, base Missed, diff
benchmarks.run.linux.arm64.checked.mch 39,564 5,753 33,811 25 (0.06%) 25 (0.06%)
coreclr_tests.run.linux.arm64.checked.mch 632,264 382,626 249,638 50 (0.01%) 50 (0.01%)
libraries.crossgen2.linux.arm64.checked.mch 176,562 13 176,549 8 (0.00%) 8 (0.00%)
libraries.pmi.linux.arm64.checked.mch 259,862 4,808 255,054 181 (0.07%) 181 (0.07%)
libraries_tests.pmi.linux.arm64.checked.mch 353,940 8,979 344,961 453 (0.13%) 453 (0.13%)
1,462,192 402,179 1,060,013 717 (0.05%) 717 (0.05%)

osx arm64

Diffs are based on 1,361,376 contexts (398,828 MinOpts, 962,548 FullOpts).

MISSED contexts: 626 (0.05%)

No diffs found.

Details

Context information

Collection Diffed contexts MinOpts FullOpts Missed, base Missed, diff
benchmarks.run.osx.arm64.checked.mch 27,244 1,330 25,914 23 (0.08%) 23 (0.08%)
coreclr_tests.run.osx.arm64.checked.mch 589,939 384,164 205,775 44 (0.01%) 44 (0.01%)
libraries.crossgen2.osx.arm64.checked.mch 202,977 15 202,962 15 (0.01%) 15 (0.01%)
libraries.pmi.osx.arm64.checked.mch 278,528 5,063 273,465 188 (0.07%) 188 (0.07%)
libraries_tests.pmi.osx.arm64.checked.mch 262,688 8,256 254,432 356 (0.14%) 356 (0.14%)
1,361,376 398,828 962,548 626 (0.05%) 626 (0.05%)

linux x64

Diffs are based on 1,356,324 contexts (348,118 MinOpts, 1,008,206 FullOpts).

MISSED contexts: 763 (0.06%)

No diffs found.

Details

Context information

Collection Diffed contexts MinOpts FullOpts Missed, base Missed, diff
benchmarks.run.linux.x64.checked.mch 38,527 5,230 33,297 27 (0.07%) 27 (0.07%)
coreclr_tests.run.linux.x64.checked.mch 541,480 327,642 213,838 49 (0.01%) 49 (0.01%)
libraries.crossgen2.linux.x64.checked.mch 142,905 13 142,892 4 (0.00%) 4 (0.00%)
libraries.pmi.linux.x64.checked.mch 259,864 4,809 255,055 189 (0.07%) 189 (0.07%)
libraries_tests.pmi.linux.x64.checked.mch 373,548 10,424 363,124 494 (0.13%) 494 (0.13%)
1,356,324 348,118 1,008,206 763 (0.06%) 763 (0.06%)

windows arm64

Diffs are based on 1,468,342 contexts (400,661 MinOpts, 1,067,681 FullOpts).

MISSED contexts: 736 (0.05%)

No diffs found.

Details

Context information

Collection Diffed contexts MinOpts FullOpts Missed, base Missed, diff
benchmarks.run.windows.arm64.checked.mch 30,210 1,324 28,886 22 (0.07%) 22 (0.07%)
coreclr_tests.run.windows.arm64.checked.mch 595,685 384,999 210,686 42 (0.01%) 42 (0.01%)
libraries.crossgen2.windows.arm64.checked.mch 215,571 15 215,556 12 (0.01%) 12 (0.01%)
libraries.pmi.windows.arm64.checked.mch 269,747 4,959 264,788 182 (0.07%) 182 (0.07%)
libraries_tests.pmi.windows.arm64.checked.mch 357,129 9,364 347,765 478 (0.13%) 478 (0.13%)
1,468,342 400,661 1,067,681 736 (0.05%) 736 (0.05%)

windows x64

Diffs are based on 1,564,142 contexts (446,611 MinOpts, 1,117,531 FullOpts).

MISSED contexts: 959 (0.06%)

No diffs found.

Details

Context information

Collection Diffed contexts MinOpts FullOpts Missed, base Missed, diff
aspnet.run.windows.x64.checked.mch 137,371 80,272 57,099 31 (0.02%) 31 (0.02%)
aspnet_block.run.windows.x64.checked.mch 33,668 20,181 13,487 2 (0.01%) 2 (0.01%)
benchmarks.run.windows.x64.checked.mch 27,023 1,328 25,695 29 (0.11%) 29 (0.11%)
coreclr_tests.run.windows.x64.checked.mch 520,551 330,386 190,165 49 (0.01%) 49 (0.01%)
libraries.crossgen2.windows.x64.checked.mch 213,772 15 213,757 46 (0.02%) 46 (0.02%)
libraries.pmi.windows.x64.checked.mch 271,836 4,960 266,876 225 (0.08%) 225 (0.08%)
libraries_tests.pmi.windows.x64.checked.mch 359,921 9,469 350,452 577 (0.16%) 577 (0.16%)
1,564,142 446,611 1,117,531 959 (0.06%) 959 (0.06%)

#Throughput impact on Windows x64

The following shows the impact on throughput in terms of number of instructions executed inside the JIT. Negative percentages/lower numbers are better.

linux arm64

No significant throughput differences found

Details

All contexts:

Collection Base # instructions Diff # instructions PDIFF
benchmarks.run.linux.arm64.checked.mch 70,241,779,365 70,241,942,566 +0.00%
coreclr_tests.run.linux.arm64.checked.mch 1,147,308,428,396 1,147,308,176,762 -0.00%
libraries.crossgen2.linux.arm64.checked.mch 117,551,674,016 117,551,797,939 +0.00%
libraries.pmi.linux.arm64.checked.mch 239,441,711,456 239,441,891,372 +0.00%
libraries_tests.pmi.linux.arm64.checked.mch 546,098,207,645 546,098,259,965 +0.00%

MinOpts contexts:

Collection Base # instructions Diff # instructions PDIFF
benchmarks.run.linux.arm64.checked.mch 1,193,926,939 1,193,926,933 -0.00%
coreclr_tests.run.linux.arm64.checked.mch 478,903,085,270 478,902,550,139 -0.00%
libraries.crossgen2.linux.arm64.checked.mch 1,713,918 1,713,918 0.00%
libraries.pmi.linux.arm64.checked.mch 1,792,682,499 1,792,682,339 -0.00%
libraries_tests.pmi.linux.arm64.checked.mch 12,994,393,758 12,994,385,379 -0.00%

FullOpts contexts:

Collection Base # instructions Diff # instructions PDIFF
benchmarks.run.linux.arm64.checked.mch 69,047,852,426 69,048,015,633 +0.00%
coreclr_tests.run.linux.arm64.checked.mch 668,405,343,126 668,405,626,623 +0.00%
libraries.crossgen2.linux.arm64.checked.mch 117,549,960,098 117,550,084,021 +0.00%
libraries.pmi.linux.arm64.checked.mch 237,649,028,957 237,649,209,033 +0.00%
libraries_tests.pmi.linux.arm64.checked.mch 533,103,813,887 533,103,874,586 +0.00%

osx arm64

No significant throughput differences found

Details

All contexts:

Collection Base # instructions Diff # instructions PDIFF
benchmarks.run.osx.arm64.checked.mch 49,083,877,660 49,084,071,774 +0.00%
coreclr_tests.run.osx.arm64.checked.mch 1,064,336,437,786 1,064,336,295,877 -0.00%
libraries.crossgen2.osx.arm64.checked.mch 134,069,111,408 134,069,361,235 +0.00%
libraries.pmi.osx.arm64.checked.mch 248,422,161,035 248,422,416,927 +0.00%
libraries_tests.pmi.osx.arm64.checked.mch 390,173,552,940 390,173,497,718 -0.00%

MinOpts contexts:

Collection Base # instructions Diff # instructions PDIFF
benchmarks.run.osx.arm64.checked.mch 648,199,069 648,199,094 +0.00%
coreclr_tests.run.osx.arm64.checked.mch 478,933,235,138 478,932,456,866 -0.00%
libraries.crossgen2.osx.arm64.checked.mch 2,149,584 2,149,584 0.00%
libraries.pmi.osx.arm64.checked.mch 1,854,638,137 1,854,636,376 -0.00%
libraries_tests.pmi.osx.arm64.checked.mch 7,372,035,772 7,372,027,382 -0.00%

FullOpts contexts:

Collection Base # instructions Diff # instructions PDIFF
benchmarks.run.osx.arm64.checked.mch 48,435,678,591 48,435,872,680 +0.00%
coreclr_tests.run.osx.arm64.checked.mch 585,403,202,648 585,403,839,011 +0.00%
libraries.crossgen2.osx.arm64.checked.mch 134,066,961,824 134,067,211,651 +0.00%
libraries.pmi.osx.arm64.checked.mch 246,567,522,898 246,567,780,551 +0.00%
libraries_tests.pmi.osx.arm64.checked.mch 382,801,517,168 382,801,470,336 -0.00%

linux x64

Overall (+0.29% to +0.47%)
Collection PDIFF
benchmarks.run.linux.x64.checked.mch +0.36%
coreclr_tests.run.linux.x64.checked.mch +0.29%
libraries.crossgen2.linux.x64.checked.mch +0.47%
libraries.pmi.linux.x64.checked.mch +0.41%
libraries_tests.pmi.linux.x64.checked.mch +0.33%
MinOpts (+0.31% to +0.71%)
Collection PDIFF
benchmarks.run.linux.x64.checked.mch +0.57%
coreclr_tests.run.linux.x64.checked.mch +0.33%
libraries.crossgen2.linux.x64.checked.mch +0.71%
libraries.pmi.linux.x64.checked.mch +0.47%
libraries_tests.pmi.linux.x64.checked.mch +0.31%
FullOpts (+0.27% to +0.47%)
Collection PDIFF
benchmarks.run.linux.x64.checked.mch +0.35%
coreclr_tests.run.linux.x64.checked.mch +0.27%
libraries.crossgen2.linux.x64.checked.mch +0.47%
libraries.pmi.linux.x64.checked.mch +0.41%
libraries_tests.pmi.linux.x64.checked.mch +0.34%
Details

All contexts:

Collection Base # instructions Diff # instructions PDIFF
benchmarks.run.linux.x64.checked.mch 68,135,546,466 68,379,028,140 +0.36%
coreclr_tests.run.linux.x64.checked.mch 882,495,624,591 885,069,447,568 +0.29%
libraries.crossgen2.linux.x64.checked.mch 83,466,954,700 83,857,666,751 +0.47%
libraries.pmi.linux.x64.checked.mch 222,676,329,016 223,598,524,442 +0.41%
libraries_tests.pmi.linux.x64.checked.mch 522,052,368,783 523,801,146,084 +0.33%

MinOpts contexts:

Collection Base # instructions Diff # instructions PDIFF
benchmarks.run.linux.x64.checked.mch 921,268,645 926,494,892 +0.57%
coreclr_tests.run.linux.x64.checked.mch 383,864,984,221 385,113,245,736 +0.33%
libraries.crossgen2.linux.x64.checked.mch 1,469,700 1,480,108 +0.71%
libraries.pmi.linux.x64.checked.mch 1,435,315,722 1,442,085,835 +0.47%
libraries_tests.pmi.linux.x64.checked.mch 11,111,188,440 11,145,441,772 +0.31%

FullOpts contexts:

Collection Base # instructions Diff # instructions PDIFF
benchmarks.run.linux.x64.checked.mch 67,214,277,821 67,452,533,248 +0.35%
coreclr_tests.run.linux.x64.checked.mch 498,630,640,370 499,956,201,832 +0.27%
libraries.crossgen2.linux.x64.checked.mch 83,465,485,000 83,856,186,643 +0.47%
libraries.pmi.linux.x64.checked.mch 221,241,013,294 222,156,438,607 +0.41%
libraries_tests.pmi.linux.x64.checked.mch 510,941,180,343 512,655,704,312 +0.34%

windows arm64

No significant throughput differences found

Details

All contexts:

Collection Base # instructions Diff # instructions PDIFF
benchmarks.run.windows.arm64.checked.mch 54,625,739,777 54,625,876,473 +0.00%
coreclr_tests.run.windows.arm64.checked.mch 1,070,801,693,193 1,070,801,607,603 -0.00%
libraries.crossgen2.windows.arm64.checked.mch 144,660,196,843 144,660,353,305 +0.00%
libraries.pmi.windows.arm64.checked.mch 251,783,794,117 251,784,105,628 +0.00%
libraries_tests.pmi.windows.arm64.checked.mch 552,053,831,373 552,053,922,837 +0.00%

MinOpts contexts:

Collection Base # instructions Diff # instructions PDIFF
benchmarks.run.windows.arm64.checked.mch 620,272,673 620,272,653 -0.00%
coreclr_tests.run.windows.arm64.checked.mch 480,159,066,522 480,158,434,797 -0.00%
libraries.crossgen2.windows.arm64.checked.mch 2,156,191 2,156,191 0.00%
libraries.pmi.windows.arm64.checked.mch 1,897,501,232 1,897,503,852 +0.00%
libraries_tests.pmi.windows.arm64.checked.mch 12,293,528,448 12,293,505,555 -0.00%

FullOpts contexts:

Collection Base # instructions Diff # instructions PDIFF
benchmarks.run.windows.arm64.checked.mch 54,005,467,104 54,005,603,820 +0.00%
coreclr_tests.run.windows.arm64.checked.mch 590,642,626,671 590,643,172,806 +0.00%
libraries.crossgen2.windows.arm64.checked.mch 144,658,040,652 144,658,197,114 +0.00%
libraries.pmi.windows.arm64.checked.mch 249,886,292,885 249,886,601,776 +0.00%
libraries_tests.pmi.windows.arm64.checked.mch 539,760,302,925 539,760,417,282 +0.00%

windows x64

Overall (+0.26% to +0.43%)
Collection PDIFF
aspnet.run.windows.x64.checked.mch +0.40%
aspnet_block.run.windows.x64.checked.mch +0.41%
benchmarks.run.windows.x64.checked.mch +0.37%
coreclr_tests.run.windows.x64.checked.mch +0.26%
libraries.crossgen2.windows.x64.checked.mch +0.43%
libraries.pmi.windows.x64.checked.mch +0.38%
libraries_tests.pmi.windows.x64.checked.mch +0.31%
MinOpts (+0.27% to +0.65%)
Collection PDIFF
aspnet.run.windows.x64.checked.mch +0.43%
aspnet_block.run.windows.x64.checked.mch +0.39%
benchmarks.run.windows.x64.checked.mch +0.41%
coreclr_tests.run.windows.x64.checked.mch +0.29%
libraries.crossgen2.windows.x64.checked.mch +0.65%
libraries.pmi.windows.x64.checked.mch +0.44%
libraries_tests.pmi.windows.x64.checked.mch +0.27%
FullOpts (+0.23% to +0.43%)
Collection PDIFF
aspnet.run.windows.x64.checked.mch +0.39%
aspnet_block.run.windows.x64.checked.mch +0.42%
benchmarks.run.windows.x64.checked.mch +0.37%
coreclr_tests.run.windows.x64.checked.mch +0.23%
libraries.crossgen2.windows.x64.checked.mch +0.43%
libraries.pmi.windows.x64.checked.mch +0.38%
libraries_tests.pmi.windows.x64.checked.mch +0.31%
Details

All contexts:

Collection Base # instructions Diff # instructions PDIFF
aspnet.run.windows.x64.checked.mch 129,966,537,466 130,489,466,100 +0.40%
aspnet_block.run.windows.x64.checked.mch 29,593,256,928 29,714,188,323 +0.41%
benchmarks.run.windows.x64.checked.mch 36,403,707,505 36,537,225,274 +0.37%
coreclr_tests.run.windows.x64.checked.mch 834,129,922,769 836,306,735,426 +0.26%
libraries.crossgen2.windows.x64.checked.mch 125,404,474,681 125,939,377,883 +0.43%
libraries.pmi.windows.x64.checked.mch 234,065,984,026 234,966,452,358 +0.38%
libraries_tests.pmi.windows.x64.checked.mch 505,444,666,254 507,013,785,527 +0.31%

MinOpts contexts:

Collection Base # instructions Diff # instructions PDIFF
aspnet.run.windows.x64.checked.mch 27,025,661,071 27,142,522,855 +0.43%
aspnet_block.run.windows.x64.checked.mch 7,885,027,941 7,915,784,656 +0.39%
benchmarks.run.windows.x64.checked.mch 481,233,685 483,205,624 +0.41%
coreclr_tests.run.windows.x64.checked.mch 378,542,257,127 379,654,270,164 +0.29%
libraries.crossgen2.windows.x64.checked.mch 1,731,927 1,743,192 +0.65%
libraries.pmi.windows.x64.checked.mch 1,456,977,826 1,463,368,396 +0.44%
libraries_tests.pmi.windows.x64.checked.mch 9,864,046,711 9,890,751,588 +0.27%

FullOpts contexts:

Collection Base # instructions Diff # instructions PDIFF
aspnet.run.windows.x64.checked.mch 102,940,876,395 103,346,943,245 +0.39%
aspnet_block.run.windows.x64.checked.mch 21,708,228,987 21,798,403,667 +0.42%
benchmarks.run.windows.x64.checked.mch 35,922,473,820 36,054,019,650 +0.37%
coreclr_tests.run.windows.x64.checked.mch 455,587,665,642 456,652,465,262 +0.23%
libraries.crossgen2.windows.x64.checked.mch 125,402,742,754 125,937,634,691 +0.43%
libraries.pmi.windows.x64.checked.mch 232,609,006,200 233,503,083,962 +0.38%
libraries_tests.pmi.windows.x64.checked.mch 495,580,619,543 497,123,033,939 +0.31%

@BruceForstall
Copy link
Member Author

All failures look known or infra

@BruceForstall
Copy link
Member Author

This change was merged into #79544

@BruceForstall BruceForstall deleted the anthonycanino_avx512-upper-regs-with-reg-accessors branch February 8, 2023 18:56
@ghost ghost locked as resolved and limited conversation to collaborators Mar 10, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants